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Structural  Determination  of  a  Transcribing  RNA  Polymerase  II  complex 
Introduction 


The  goal  of  the  proposed  research  is  to  determine  the  X-ray  structures  of  RNA 
polymerase  II  in  the  midst  of  transcription  complex  at  atomic  resolution  and  with 
regulatory  proteins.  The  purpose  of  the  proposed  research  is  to  provide  a  structural  basis 
for  imderstanding  the  mechanism  of  transcription,  regulation  of  the  process,  and  altered 
regulation  as  occurs  in  tumor  cells. 

The  transcription  mechanism,  of  which  RNA  Polymerase  II  is  the  key  player,  appears  to 
be  universal.  Human  RNAPII  subunits  tested,  were  foimd  able  to  replace  their  yeast 
counterparts  in  vivo  (1).  Therefore,  studies  of  yeast  RNAPII  may  be  expected  to  reveal 
general  principles  of  eukaryotic  transcription  and  its  regulation.  The  yeast  enzyme  is 
especially  suited  for  3-D  structural  analysis  because  a  large  amount  of  pure  material  can 
readily  be  obtained  from  yeast  cell  culture.  Yeast  RNA  polymerase  lacking  subunits  4  and 
7  was  shown  to  be  more  homogenous  than  the  wild  type  enzyme  (2,3)  and  was  therefore 
used  for  structural  studies.  The  effort  of  crystallizing  the  polymerase  in  the  midst  of 
transcription  has  also  required  this  project  to  aid  in  solving  the  polymerase  structure  alone 
(see  below).  After  many  years  of  effort  this  has  just  recently  been  achieved  and  a 
backbone  structure  of  the  ten  subunit  enzyme  was  recently  determined  (4). 

In  breast  cancer  research  there  are  two  major  routes  of  study.  The  first  is  to  use  methods 
at  hand  or  develop  methods  to  directly  intervene  and  eradicate  tumor  cells.  These  can  be 
by  invasive  or  non-  invasive  means.  A  second  route  of  study  is  in  advancing  our 
knowledge  of  the  disorder  itself.  The  current  research  is  of  this  nature.  Cancer  cells  are 
different  than  normal  cells  in  that  they  have  altered  regulation.  The  key  point  of 
regulation  on  the  cellular  level  is  that  of  transcription,  hideed  mutations  in  tumor 
suppressor  genes  such  as  p53  and  inherited  mutations  in  the  breast  and  ovarian  cancer 
susceptibility  gene,  BRCAl,  are  directly  associated  with  breast  cancer. 

The  path  by  which  mutations  are  capable  of  altering  cellular  traits  is  by  affecting 
regulation  of  specific  genes  either  at  initiation  or  elongation  of  RNA  polymerase  H. 
During  elongation,  RNA  polymerase  II  pauses  on  its  transcript.  Proteins  such  as  TFIIS,  can 
regulate  the  amoimt  of  the  read  through.  Indeed,  an  additional  elongation  factor,  SHI,  has 
been  shown  to  be  a  target  of  the  VHL  tumor  suppressor  protein  and  able  to  directly 
regulate  its  function  (5).  Mutations  in  the  VHL  gene  predispose  individuals  to  a  variety  of 
tumors  (6).  This  is  a  clear  case  of  point  mutations  in  a  gene,  directly  affecting  the 
regulatory  mechanism 

In  the  case  of  breast  cancer  there  has  been  growing  evidence  that  altered  regulation 
occurs  on  the  level  of  RNA  Polymerase  II  transcription  initiation  as  well  as  elongation. 
Recently,  p53  has  been  shown  to  regulate  CAK  kinase  activity.  CAK  kinase  is  a 
component  of  the  basal  transcription  factor  TFIIH  found  to  be  necessary  for  CTD 
phosphorylation  of  RNA  polymerase  II  in  order  to  allow  elongation  of  transcription  (7).  In 
addition  p53  has  been  shown  to  interact  directly  with  the  general  transcription  factor 
iPllU  and  its  TATA  box-binding  protein  component  (8).  Inhibition  of  RNA  polymerase  11 
is  a  possible  trigger  for  p53  response  (9).  Another  example  is  that  of  BRCAl  which  was 
foimd  to  be  a  component  of  the  RNA  polymerase  II  holoenzyme  complex  (10).  It  was 
fiuther  shown  to,  activate  transcription  when  linked  with  a  DNA-binding  domain  (11). 
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The  most  efficient  means  of  generating  RNA  polymerase  II  elongation  complexes  is  with 
the  aid  of  tailed  oligonucleotide  templates,  hutiation  on  a  single  strand  protruding  from 
the  3'-end  of  duplex  DNA  does  not  require  accessory  factors  and  allows  for  highly 
efficient  generation  of  functional  elongation  complexes  (12).  Such  a  "tailed"  template  may 
be  viewed  as  half  of  an  imwound  "bubble",  which  occurs  at  the  active  site  of  RNA 
polymerase  molecules  during  transcription.  Consistent  with  this  idea,  transcription  starts 
within  the  single  stranded  region,  about  three  bases  from  the  junction  with  duplex  DNA, 
in  both  tailed  templates  and  in  the  imwound  bubble  of  an  elongation  complex  (12).  At 
least  two  possible  paused  complexes  can  exist.  The  first  is  halted  or  paused  due  to  the  lack 
of  a  single  nucleotide  such  as  UTP  (13).  The  second  is  arrested  even  in  the  presence  of  all 
nucleotides,  due  to  the  DNA  structure  arising  from  its  primary  sequence  (14).  Although 
structural  determination  of  the  ternary  complex  in  a  hmctional  state  imtil  date  has  not 
been  shown,  use  of  tailed  templates  has  allowed  the  applicant  to  develop  a  system  with 
appropriate  templates  for  the  generation,  purification  and  crystallization  of  this  complex 
(15). 

In  previous  reports  from  this  grant  proposal,  the  means  of  generating  RNA  polymerase  II 
elongation  complexes  was  described.  Basically  initiation  on  a  single  strand  of  DNA 
protruding  from  the  3 '-end  of  duplex  DNA  allowed  for  the  efficient  generation  of 
elongation  complexes  (15).  Employing  such  "tailed"  templates,  transcription  starts  within 
the  single  stranded  region,  about  three  bases  from  the  jimction  with  duplex  DNA,  in  both 
tailed  templates  and  in  the  unwoimd  bubble  of  an  elongation  complex  (13).  Elongation 
complexes  were  "halted"  on  tailed  templates  by  transcription  in  the  absence  of  UTP,  so 
that  the  polymerase  halted  when  the  first  T  residue  in  the  template  was  reached.  The 
halted  complexes  generated  on  the  tailed  templates  are  advantageous  for  crystallization 
because  of  their  uniformity  m  content  of  DNA  and  RNA  sequences  (15). 

Despite  the  difficulties  in  determining  the  structure  of  the  polymerase  alone,  successful 
crystallization  and  diffraction  of  elongation  complexes  were  achieved  under  this  grant.  In 
addition,  initial  success  at  generated  polymerase-TFIIS  co-crystals,  were  also  achieved. 
Taken  together  with  the  generation  of  a  structural  model  of  the  polymerase  alone,  almost 
all  the  work  required  to  generate  the  structures  in  question  have  been  completed. 
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Many  substantial  achievements  took  place  during  the  last  year  of  support  from  the  breast 
cancer  initiative. 

1.  Collection  of  a  complete  data  set  to  3.2  A  of  a  transcribing  RNA  Polymease  II  complex  in 
the  midst  of  transcription. 

Previously,  the  best  elongation  complex  crystals  were  plate  crystals  and  consisted 
of  a  C2  symmetry  group.  They  diffracted  to  a  limit  of  6.0  A  with  a  high  mosaic  spread. 
Furthermore  when  soaked  with  heavy  metals  and/or  cryoprotectant,  they  cracked.  Since 
diffraction  of  crystals  in  a  synchrotron  beam  was  necessary  due  to  the  size  of  polymerase, 
cryosoaking  followed  by  freezing  of  the  crystals  became  a  key  and  crucial  aspect  prior  to 
data  collection.  A  key  decision  to  search  for  new  crystal  forms  was  then  implemented 
during  the  duration  of  the  current  fimding,  though  unsuccessful. 

During  the  search  for  growth  of  other  improved  crystal  forms  employing  template 
9Pause  (figure  1),  it  was  found  that  improved  crystal  diffraction  could  be  achieved  from 
the  plate  crystals  by  changing  the  cryosoaking  conditions.  After  crystallization,  the  mother 
liquor  in  wfiich  the  plate  elongation  complex  crystals  were  grown  (16%PEG  6000,  390mM 
Ammonium/Sodium  Phosphate  pH  6.0  and  SmMDTT)  were  gradually  cryosoaked  over  a 
10-16  hour  period.  The  cryosoaking  buffer  contained  350mM  Sodium  Chloride,  lOOmM 
MBS  pH  6.2, 18%  PEG  400, 15.5%  PEG6000,  50  mM  Dioxane  and  3mM  DTT.  Crystals  were 
then  placed  at  4°C  for  4-12  days  and  frozen  in  liquid  nitrogen.  As  a  result,  one  in  15 
crystals  diffracted  to  better  than  4A.  Since  many  of  the  plate  crystals  were  in  either 
twinned  or  cracked,  a  large  number  of  crystals  were  screened.  This  indeed  proved  to  be  a 
successful  strategy  as  a  complete  data  set  to  3.2A  of  the  plate  crystals  (C2  form)  were 
collected  in  SSRL  Beam-line  9-2  and  remains  a  major  achievement  of  this  project  (figure  2). 
Indeed  the  mosaic  spread  of  0.7  of  this  crystal  is  a  far  improvement  over  the  original  1.5 
mosaic  spread  of  the  CHESS  6A  data  set  (figure  2).  Two  diffraction  data  sets  were  taken  of 
this  crystal,  the  first  a  native  data  set  and  the  second  taken  at  the  zinc  anomalous  peak.  AH 
data  were  processed  within  the  CCP4  program  suite  unless  otherwise  indicated  in  the  text. 
Data  was  initially  processed  with  DENZO  followed  by  scaling  with  SCALEPACK. 
Formatting  of  HKL  to  MTZ  was  by  Scalepack2mtz,  followed  by  changing  of  I  to  F  by 
Truncate.  Cad  was  then  used  to  scale  data  and  Scaleit  to  scale  different  data  sets  when 
necessary. 

2.  Strategy  for  Structural  Determination  of  the  Elongation  Complex  and  an  RNA 
Polymerse  II  Mainchain  Model 

In  order  to  determine  a  protein's  structure  using  x-  ray  diffraction  technology,  one 
needs  good  diffraction  data  and  phase  information. 

Since  crystal  growth  and  diffraction  of  the  elongation  complex  turned  out  to  be  a 
massive  imdertaking,  it  was  necessary  to  develop  an  efficient  strategy  to  achieve  both 
good  diffraction  data  and  phase  information.  Although  the  plate  crystals  were  fragile  and 
difficult  to  grow  the  recent  improvements  allowed  for  the  achievement  of  the  collection  of 
successful  data  as  described  above  (section  1). 

Good  phase  information  though  has  proven  to  be  an  equally  challenging  problem. 
Firstly,  determining  of  phases  using  heavy  metals  requires  a  fair  amoimt  of  high  quality 
crystals  and  very  few  good  elongation  crystals  were  successfully  diffracted.  Secondly, 
elongation  plate  crystals  (C2  form)  were  fragile  and  cracked  in  the  presence  of  heavy 
metals. 

It  was  therefore  decided  that  the  most  efficient  strategy  would  be  to  perform 
molecular  replacement  with  a  model  of  the  polymerase  alone  and  use  the  data  from  the 
elongation  complex.  This  project  was  being  performed  by  in  collaboration  with  co- 
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workers  in  Roger  Kornbergs  laboratory.  Therefore  a  fair  amount  of  effort  went  into 
helping  attain  a  model  of  tfie  polymerase.  Indeed,  recently  a  mainchain  model  of  the 
polymerase  has  been  achieved  (4).  This  model  though  is  not  ideal  for  molecular 
replacement.  It  consists  mostly  of  polyalanine.  Despite  this,  it  was  recently  used  for 
molecular  replacement  with  the  elongation  complex  data  in  an  attempt  to  extract  any 
information  from  this  technique.  A  refined  model  should  be  made  available  in  the  near 
future  and  the  molecular  replacement  will  be  repeated. 

In  the  model,  the  two  largest  subunits  of  RNA  polymerase  II  form  a  central  core 
and  the  remaining  subunits  are  peripheral  to  the  core.  A  long  cleft  in  the  molecule  is 
ideally  suited  to  fit  nucleic  acids.  One  side  of  the  channel  is  composed  of  a  domain 
consisting  of  subunits  1,2  and  6  and  is  termed  the  clamp  domain  (4). 

3.  Molecular  replacement  of  the  elongation  complex  data  reveal  a  key  movement  of  a 
clamp  domain. 

The  mainchain  model  of  RNA  polymerase  II  was  employed  for  molecular 
replacement  of  the  plate  C2  elongation  crystal.  The  Amore  molecular  replacement 
programs  were  used  initially  and  CNS  (X-PLOR)  programs  to  confirm  results.  The 
solutions  found  in  both  cases  were  nearly  identical. 

Structure  factors  were  then  generated  using  Sfall  (CCP4  suite)  from  the  model 
shifted  into  the  position  of  the  molecular  replacement  solution,  and  the  scaled  diffraction 
data  from  the  elongation  complex  plate  crystal  {C2  form).  Finally  electron  density  maps 
were  generated  using  the  program  FFT.  Density  maps  were  generated  for  the  native  data 
and  for  the  anomalous  data  taken  at  the  zinc  edge.  In  addition  an  Omit  electron  density 
map  was  generated  using  the  CNS  program  Omit. 

The  mainchain  model  of  RNA  polymerase  II  has  eight  zinc  atoms.  Using  the 
anomalous  signal  from  the  C2  elongation  complex  data  allowed  for  the  localization  of  the 
zinc  atoms  in  the  elongation  complex.  The  comparison  revealed  a  clear  shift  in  the 
positions  of  4  of  the  eight  zinc  atoms.  This  indicates  that  there  is  protein  domain  shifting. 
Three  of  the  zinc  atoms  are  found  in  a  single  domain  that  we  now  call  the  "clamp"  domain. 
One  of  the  zinc  atoms  in  the  clamp  indeed  shifted  about  16 A,  which  is  quite  substantial. 

The  position  of  the  clamp  domain,  which  has  moved  from  its  position  in  the  model, 
was  confirmed  by  two  means.  Firstly,  the  clamp  domain  was  manually  moved  to  align  the 
zinc  atoms  of  the  model  with  the  zinc  atoms  observed  from  the  anomalous  zinc  signal. 
The  shifted  clamp  domain  was  then  observed  to  give  a  nice  fit  with  the  electron  density  of 
the  elongation  complex  Omit  map  ( figure  3). 

As  mentioned  above,  the  clamp  domain  composes  one  side  of  the  long  DNA 
channel.  In  the  RNA  Polymerase  11  model,  the  clamp  is  found  in  an  open  conformation 
and  in  the  elongation  complex  it  is  in  a  closed  conformation.  In  the  closed  conformation,  a 
tight  binding  clamp  is  formed  on  DNA  placed  in  the  channel,  and  it  is  has  no  room  to 
dissociate  from  the  polymerase.  In  the  open  conformation,  double  stranded  DNA  has 
more  than  sufficient  room  to  either  enter  or  exit  the  channel.  This  can  be  observed  by 
placing  a  B-form  double  stranded  DNA  molecule  in  the  cleft  of  the  model  structure  in  both 
the  open  conformation  and  the  elongation  complex  closed  conformation  (figure  4).  In 
addition  the  clamp  in  the  elongation  complex  appears  to  be  situated  in  a  position  that 
allows  for  direct  interaction  of  the  clamp  domain  with  the  nucleic  acids  (figure  4). 

The  closing  of  the  clamp  during  transcription  and  the  moveable  domain  help 
explain  the  ability  of  RNA  polymerase  II  to  be  a  process! ve  enzyme.  Transcription  of  long 
genes  such  as  the  blood  clotting  protein  factor  VIII,  requires  thousands  of  bases  to  be 
transcribed  without  polymerase  disengaging.  A  mechanical  clamp,  as  we  observe  in  the 
elongation  complex,  is  the  mechanism  of  choice.  On  the  other  hand,  for  the  DNA  to 
rapidly  move  through  the  enzyme,  a  degree  of  freedom  is  required.  This  freedom  of 
movement  is  nicely  observed  in  the  clamp  domain.  The  clamp  domain  could  also  serve  as 
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a  target  for  other  functions.  Release  factors  would  have  to  be  involved  in  the  "prying" 
open  the  clamp  domain  prior  to  the  releasing  of  the  DNA. 

Disclosing  such  important  features  of  the  polymerase  is  a  good  beginning  in 
imderstanding  of  the  regulation  of  transcription.  The  next  stage  is  in  imderstanding  where 
regulatory  factors  bind  on  the  erizyme.  A  factor  binding  the  clamp  domain  could  have  a 
significant  effect  on  transcription  either  in  causing  pausing  during  transcription  or  release 
of  pol)anerase  from  the  DNA.  It  is  likely  that  such  regulatory  factors  may  be  present  in 
differing  amoimts  in  breast  cancer  cells  and  in  normal  cells,  and  may  be  partly  responsible 
for  the  altered  regulation  of  transcription. 

In  previous  studies,  polymerase  in  these  crystals  were  not  only  shown  to  contain 
DNA  but  also  the  to  maintain  and  elongation  competent  conformation  (15).  No 
substantial  electron  density  fitting  nucleic  acids  were  observed  from  the  molecular 
replacement  results.  This  strongly  suggests  that  the  DNA  in  the  C2  plate  crystal  form  may 
be  disordered  to  some  degree.  Two  possibilities  for  disorder  exist.  Firstly,  initiation  at 
three  different  positions,  generating  RNA  of  3  different  sizes,  may  be  partly  responsible. 
Secondly,  the  polymerase  is  known  to  "backslide"  during  transcription.  At  that  point,  the 
active  site  disengages  from  the  3'  of  the  RNA  and  the  polymerase  slides  back  on  the  DNA. 
If  the  end  position  of  the  backsliding  is  not  homogeneous  in  all  or  most  of  the  elongation 
complexes,  there  may  exist  multiple  conformations  and  the  nucleic  acid  density  woidd  be 
difficult  to  observe.  In  this  case  high  quality  phase  information  may  be  necessary  to 
visually  the  nucleic  acids.  For  this  reason,  it  is  essential  for  the  molecular  replacement 
model  to  be  as  accurate  as  possible.  Such  a  model  should  be  shortly  completed  and  the 
conclusions  of  the  molecular  replacement  will  be  made  known. 

4.  Improvement  of  the  biochemical  system  allowing  for  more  homogeneous  elongation 
complexes 

Another  means  of  generating  better  diffracting  crystals  is  to  change /improve  the 
current  templates.  By  changing  the  template  size  and  homogeneity,  a  different  and 
perhaps  better  crystal  form  may  be  generated.  It  is  essential  to  recall  that  on  the  tailed 
templates,  initiation  began  at  -3,-4  and  -5  relative  to  the  double  stranded  junction.  This 
meant  that  although  all  the  RNA  was  halted  at  a  single  base  because  of  the  withholding  of 
UTP,  there  still  existed  3  species  of  RNA  differing  in  size  from  1  to  3  bases  in  length. 

To  overcome  this  problem  a  series  of  new  templates  were  generated.  The  object 
was  to  allow  for  initiation  at  a  single  site  and  maintain  a  structure  as  close  to  the  native 
elongation  bubble  as  possible. 

4a.  RNA  Polymerase  II  Elongation  complexes  are  mobile  during  iiutiation  on  tailed 
templates. 

Initiation  on  tailed  templates  occurs  primarily  at  bases  -3,-4  and  -5  from  the  double 
stranded-single  stranded  junction.  In  these  reactions  UTP  is  withheld  to  allow  for  efficient 
pausing  of  the  polymerase.  Therefore,  Cytidine  bases  at  -4,  -5,  and  -6  (relative  to  the 
double  stranded  jimction)  in  the  single  stranded  polyC  tail  of  template  9Pause  were 
changed  to  A  residues,  resulting  in  template  lOPause  (figure  1).  In  transcription  reactions  it 
was  quite  intriguing  to  notice  that  polymerase  was  imable  to  initiate  at  -4,  -5,  of  template 
lOPause  and  in  turn  initiated  at  -2  and  -1  (figure  5).  This  is  evidence  that  during  initiation  of 
tailed  templates  are  not  fixed  in  place  and  the  position  of  the  active  site  is  non- 
homogenous.  Indeed  the  appearance  of  more  than  one  initiation  in  all  tailed  templates  to 
date  is  evidence  of  such  mobility.  It  is  apparent  that  polymerase  has  a  kinetic  preference  to 
maintain  the  active  site  position  at  bases  -3,  -4  and  -5  but  is  in  dynamic  motion,  with  a  less 
preferred  forward  sliding  as  observed  with  template  lOPause. 
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The  presence  of  ATP  added  before  pausing  or  after  pausing,  was  mostly 
inconsequential  though  some  minor  differences  in  the  elongation  pattern  are  observed. 
Compare  lanes  9P,  Eb  and  Ea  in  figure  5  for  an  example. 

4b.  Shortening  the  template  size 

Previously  (15)  it  was  shown  that  to  maintain  an  active,  yet  paused  elongation 
complex,  between  16  and  22  bases  upstream  of  the  pause  site  in  the  non-coding  strand  is 
needed.  Comparing  transcription  of  templates  9Pause  and  llPause  establish  that  18  bases 
downstream  of  the  pause  site  is  sufficient  (figure  5). 

4c.  Mimicking  the  non-template  strand  DNA  in  the  transcription  bubble 

During  transcription  a  7-9  base  DNA-RNA  hybrid  exists.  In  turn,  some  DNA  on  the 
non-template  strand  is  displaced  to  form  the  transcription  bubble  and  remains  in  a  single 
stranded  form.  Mimicking  this  may  then  add  stability  to  the  complex  and/or  allow  it  to 
become  more  like  the  form  found  in  vivo.  Template  12P  and  13P  were  designed  with  5 
non-homologous  bases  in  the  5'  region  of  the  non-template  strand  and  would  therefore 
be  imable  to  hybridize  with  the  template  strand,  mimicl^g  a  transcription  bubble  (figure 
1).  In  transcription  assays  (Figure  5,  Gel  B)  the  overall  efficiency  of  transcription  remains 
the  same,  however  a  difference  in  the  elongation  pattern  is  observed  when  comparing 
template  10  and  12.  Some  "faulty"  read-through  present  in  figure  5,  with  template  lOP  is 
not  present  compared  to  template  12P  where  the  5  base  5'  overhang  exists.  This  could  be 
due  to  increased  stability  resulting  from  the  binding  to  the  5  base  overhang  in  the 
nontemplate  strand. 


4d.  Forcing  initiation  at  a  single  base  and  preventing  leakage  from  the  pause  site 

Improving  the  homogeneity  of  the  RNA  species,  it  was  thought,  may  allow  for 
improved  crystal  diffraction.  As  mentioned  above  in  section  4a,  polymerase  can  initiate  at 
-2  and  -1  relative  to  the  double  stranded  junction  on  tailed  templates  when  prevented 
from  initiating  at  -4  and  -5.  This  was  also  observed  from  comparing  the  paused 
transcription  patterns  of  template  13P  with  those  of  12P  and  14P  (figure  6  GelC).  15P 
initiates  at  -3  and  -2  and  14P  initiates  at  -2  and  -1  because  they  were  prevented  from 
initiation  below  -4  or  below  -3  respectively  (see  sequences,  figure  1).  It  is  also  observed 
that  the  paused  elongation  patterns  using  forced  initiation  are  more  homogeneous  in 
length  compared  to  those  that  are  not  (figure  6,  GelC,  compare  13P  with  14P). 

Finally,  a  transcription  system  could  now  be  designed  for  initiation  at  a  unique  site. 
Template  ITPause  (figure  1)  is  comprised  of  a  tailed  template  and  an  RNA  primer.  The  size 
of  the  RNA  primer  was  chosen  at  9  bases  because  it  is  also  9  bases  in  native  elongation 
complexes.  The  tail  sequence  was  made  homologous  with  the  RNA  primer  and  contained 
4  Adenosine  bases  immediately  before  the  double  stranded  region.  The  nontemplate 
strand  was  designed  with  a  3  base  5'  nonhomologous  sequence,  in  keeping  with  the  idea 
of  mimicking  a  transcription  bubble,  with  a  distance  of  greater  that  lOA,  possibly  allowing 
it  to  reach  its  binding  native  binding  domain.  In  transcription  reactions,  m  addition  to  A,C 
and  GTP  a  deoxyUTP  (dUTP)  chain  terminator  was  employed.  This  would  have  a  two-fold 
advantage.  Firstly,  initiation  could  not  start  from  within  the  A  bases,  immediately  before 
the  douWe  stranded  region.  Secondly,  contamination  of  reactions  by  UTP  would  lead  to 
some  read-through,  resulting  in  a  non-homogeneous  mixture  of  RNA  species,  which  is 
not  ideal  for  crystallization.  dUTP,  as  a  chain  terminator  would  prevent  that  from 
happening.  Indeed,  The  combination  of  the  various  technologies  in  use,  were  successful  in 
attaining  the  goal  of  a  nearly  completely  homogenous  RNA  species.  The  tailed  template 
without  the  RNA  primer  (figure  6,  GelD,  17P-RNA)  initiated  very  poorly.  It  is  also  evident 
that  reactions  in  GelD  figure  6,  were  contaminated  with  some  UTT*  since  synthesis  went 
beyond  the  pause  site  in  both  17P  and  17P-RNA.  The  problem  of  contaminating  UTP 
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though  is  non-existent  when  dU  is  employed  and  an  almost  completely  homogenous 
band  RNA  species  is  observed  (Figure  6  GelD  17P  lane  E  dU).  This  template  is  greatly 
improved  from  the  original  tailed  templates.  Crystallization  could  be  performed  in  the 
presence  of  magnesium  and  A,C.G,  nucleotides  and  dU  without  fear  of  misincorporation. 
Indeed,  these  paused  elongation  complexes  were  employed  for  crystallization  and 
resulted  in  a  much  better  diffracting  crystal  form  (see  below). 

5.  Intrinsically  arrested  elongation  complexes. 

Intrinsically  arrested  elongation  complexes  contain  polymerase  in  the  midst  of 
transcribing  an  RNA  strand  yet  is  imable  to  continue  even  in  the  presence  of  all  four 
nucleic  acid.  It  was  previously  reported  that  adding  a  polyT  region  at  a  specific  site  to 
tailed  template  allows  for  nearly  100%  of  polymerase  to  become  arrested  (15).  Many  of 
the  templates  generated  imder  the  current  grant  support  contain  such  sites.  The  Poly  T 
stretches  are  located  immediately  after  the  pause  sites  in  most  templates.  For  example, 
template  14P  in  gels  C  and  D,  figure  6,  allow  for  nearly  100%  arresting  immediately  after 
the  pause  site.  Careful  inspection  however,  leads  us  to  conclude  that  the  arresting  does 
not  occur  at  a  single  base,  but  is  spread  out  over  4  or  5  bases.  In  figure  6,  Gel  D,  compare 
template  14P  lane  P  which  has  nearly  a  single  paused  complex,  with  lane  E,  where 
although  most  has  arrested,  it  was  non-uniform.  Indeed,  it  is  unclear  at  this  point  if  it  is 
possible  to  generate  a  complex,  which  is  homogeneously  arrested  at  a  single  base.  In 
addition,  the  challenges  mvolved  in  arriving  at  the  paused  complex  structure  have  taught 
that  a  very  homogeneous  RNA  species  may  be  a  key  prerequisite  to  structural 
determination.  Since  this  is  not  currently  possible  with  the  intrinsically  paused  (arrested) 
complex,  crystallization  of  current  arrested  complexes  would  probably  not  result  in  useful 
information. 

6.  New  elongation  complex  crystals  with  improved  diffraction. 

Template  17pause  was  then  employed  in  growing  crystals.  A  screen  was  set  with 
the  PEG6000  concentration  being  the  only  variable.  Within  1  week  crystals  with  plate-like 
morphology  grew  at  14  and  15  percent  PEG6000.  Those  were  then  cryosoaked  and  frozen 
and  proved  to  be  related  to  the  C2  elongation  crystal  form.  From  two  weeks  to  one 
month,  an  additional  crystal  form  in  lower  PEG6000  concentrations  (12-13%)  was 
observed  and  it  appeared  morphologically  similar  to  the  native  enzyme  crystals.  After 
cryosoaking  and  freezing  these  crystals  diffracted  well. 

Most  crystals  diffracted  to  4A  or  better  and  were  isomorphous  to  the  1222  native 
crystal  form,  with  the  shorter  a  axis  (Form2,  see  below).  A  full  data  set  was  taken  at  a 
wavelength  of  0.98A  and  is  complete  to  3.lA  and  a  sample  diffraction  pattern  is  shown  in 
figure  7.  The  wavelength  0.98A  is  in  the  tail  of  the  zinc  anomalous  signal  and  using  phases 
from  forml  with  the  difference  anomalous  signal  of  the  new  elongation  complex  allowed 
for  immediate  localization  of  the  positions  of  the  8  zinc  atoms  in  the  1222  elongation 
crystal.  This  crystal  form  is  a  marked  improvement  over  the  C2  plate  crystal  form.  Firstly, 
2/3  of  crystals  tested  diffract  to  4A  or  better,  whereas  only  few  C2  plate  crystals  diffracted 
to  4A.  This  allows  for  consistent  diffraction  data  collection.  Secondly,  the  1222  elongation 
crystals  were  easily  manipulated  with  little  observed  physical  damage  to  them  as  opposed 
to  the  extreme  sensitivity.  Most  importantly  is  that  they  are  closely  related  to  the  native 
polymerase  structures.  This  has  allowed  for  direct  visualization  of  the  zinc  atoms.  In 
addition,  since  the  native  crystal  form  structures  will  shortly  be  refined,  phases  could  be 
used  directly  with  the  data,  possibly  eliminating  the  need  for  molecular  replacement. 

The  new  1222  elongation  crystal  form  required  slightly  lower  PEG6000  and  much 
more  time  to  grow.  It  was  therefore  necessary  to  determine  if  indeed  they  remained  an 
elongation  complex.  For  this  purpose  a  non-denaturing  nusieve  agarose  gel  was  used  for 
confirmation.  Previously  (15)  it  was  shown  that  polymerase,  polymerase  and  DNA 
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(binary  complex)  and  Polymerase,  DNA  and  RNA  (elongation  complex)  migrate 
differently  on  non-denaturing  nusieve  agarose  gels.  1222  elongation  crystals  were  then 
washed  in  mother  liquor,  dissolved  and  loaded  onto  a  nusieve  agarose  gel.  In  figure  8, 
polymerase  from  the  1222  crystals  clearly  migrate  differently  than  polymerase  alone  or 
polymerase  binary  complex.  A  very  small  amount  of  the  elongation  complex  appears 
dissociated  which  sometimes  occurs,  after  harshly  dealing  with  elongation  complexes, 
such  as  rapid  pipetation  needed  to  dissolve  the  crystals. 

7.  Unwinding  of  the  double  stranded  region  is  crucial  in  determining  the  site  of  initiation 
on  tailed  templates. 

Since  the  site  of  initiation  is  relative  to  the  distance  from  the  double  stranded  DNA, 
the  first  two  bases  of  the  double  stranded  region  were  altered  in  the  template  strand  from 
AA  to  GG.  The  idea  behind  the  change  is  that  the  tighter  base  pairing  GG/CC  would 
lessen  the  ability  of  the  DNA  template  to  imwind.  Indeed  the  degree  of  opening  of  the 
double  stranded  region  may  determine  the  precise  initiation  site  on  tailed  templates. 
Template  12P  and  15P  differed  only  in  that  15P  had  the  tighter  binding  GG/CC  at  the  5'  of 
the  double  stranded  region  (figure  6).  Indeed  the  two  base  change  proved  to  dramatically 
collapse  the  paused  transcription  pattern  (figure  6,  Gel  C,  compare  12P  and  15P).  The 
pattern  indicates  that  initiation  at  -1  and  -2  was  nearly  abolished  by  the  "tighter"  closed 
DNA. 

A  possible  mechanism  for  the  patterns  of  initiation  on  tailed  templates  can  now  be 
proposed,  based  on  the  structural  information  from  the  elongation  complex.  A  dramatic 
movement  of  the  clamp  domain  has  been  observed  in  the  elongation  complex.  In  addition 
it  is  highly  probable  that  the  clamp  directly  binds  the  DNA  (figure  3B,  notice  the  alpha 
helix  in  close  proximity  to  the  DNA).  Once  the  clamp  is  bound  to  the  tailed  template, 
movement  of  the  clamp  away  from  the  cleft  could  then  imwind  to  some  degree  the 
double  stranded  region.  When  initiation  begins  on  the  tailed  template,  several  structural 
states  would  exist.  The  clamp  bound  to  the  non-template  strand  for  example,  could  melt 
the  DNA  to  varying  degrees  while  it  moves  away  from  the  cleft.  Indeed  it  may  oscillate 
back  and  forth  while  bound  to  the  nontemplate  strand  due  to  forces  of  the  DNA 
rehybridizing  and  pulling  the  arm  closed.  Initiation  at  any  moment  in  time  would 
therefore  begin  at  various  bases.  This  explains  the  existence  of  multiple  initiation  sites  as 
weU  by  the  constitution  of  the  changing  of  the  first  two  bases  in  the  double  stranded 
region  of  the  tailed  templates  (figure  6  GelC,  compare  12P  and  15P). 
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Conclusions 


1.  Original  Project  Objectives 

Several  Objectives  were  set  forth  in  the  original  proposal  to  be  performed  during 
the  duration  of  this  fellowship.  It  is  clear  that  most  of  the  necessary  results  needed  to  fulfill 
the  objectives  have  been  achieved.  These  will  be  listed  and  discussed  below. 

Objective  1.  X-ray  structure  determination  of  RNA  polymerase  II  at  6  A  resolution 

This  goal  was  achieved  with  the  successful  generation  of  a  SA  electron  density  map 
(16)  and  finally  a  3.3A  mainchain  model  of  the  enzyme  (4).  This  goal  was  set  in  order  to 
allow  for  the  generation  of  a  model  for  molecular  replacement.  It  was  assumed  a  priori 
that  generating  functional  homogeneous  elongation  complex  crystals  would  be  a  major 
and  difficult  project  alone  and  that  the  native  polymerase  model  would  greatly  simplify 
the  process  of  determining  phases  by  employing  molecular  replacement. 

Objective  2.  X-ray  structure  determination  of  ternary  complex  at  3.5  A  resolution 

Support  of  this  research  has  allowed  for  the  collection  of  complete  data  sets  of 
elongation  complexes.  The  first,  generated  using  the  tailed  template  9Pause  was  complete 
to  3.2A  and  is  the  C2  form,  from  the  plate-like  crystals.  Although  successful,  many  crystals 
were  grown  to  obtain  the  current  data  because  of  the  mechanically  weak  condition  of  the 
crystals.  In  addition  only  few  crystals  diffracted  to  better  than  4A.  The  second  form  is  the 
1222  crystal  form,  which  is  isomorphous  with  the  native  form2  crystals.  Diffraction  from 
this  crystal  was  superior  to  that  of  the  C2  form  in  that  most  crystals  diffracted  to  better 
than  4A,  and  crystals  were  less  prone  to  damage  by  the  various  techniques  employed  in 
crystal  manipulation.  A  complete  data  set  to  3.lA  was  collected  and  some  diffraction  was 
observed  even  beyond  3.lA. 

It  must  be  noted  that  a  large  effort  was  made  not  only  in  the  field  of  x-ray 
crystallography,  but  also  on  the  level  of  the  biochemistry  of  generating  elongation 
complexes.  This  can  be  observed  in  part  from  results  included  in  this  document  whereby  a 
new  biochemical  system  was  devised  to  allow  for  the  generation  of  more  homogeneous 
elongation  complexes,  with  a  nearly  homogeneous  RNA  species. 

Although  this  project  has  not  been  completed,  it  is  indeed  very  close  to  completion. 
For  molecular  replacement  to  give  good  phase  information  the  initial  model  needs  to  be 
well  refined.  In  our  case,  the  initial  model  is  an  unrefined  mainchain  model,  which  could 
only  supply  limited  phase  information.  This  though  is  not  of  serious  consequence  because 
a  refined  model  with  sidechains  is  currently  being  generated.  Indeed  all  the  information  to 
complete  the  structure  is  at  hand  and  the  work  is  underway.  This  means  that  within  a 
short  period  of  time,  a  high  quality  model  will  be  available  for  molecular  replacement. 

In  the  mean  time,  molecular  replacement  with  the  current  mainchain  model  and  the 
C2  diffraction  data  has  yielded  valuable  information  about  the  elongation  complex.  It  is 
clear  that  the  cleft  domain  described  in  this  document  is  a  moveable  domain.  TTie  clamp 
domain  forms  one  wall  of  a  long  nucleic  acid  chaimel.  In  the  mainchian  model  it  is  foimd 
in  an  open  conformation  (figures  3  and  4).This  conformation  is  structurally  compatible 
with  the  entrance  or  disengaging  of  DNA.  The  clamp  in  the  elongation  complex  however 
is  in  a  closed  conformation  (figures  3  and  4).  In  this  conformation  it  moves  closer  to  the 
other  wall  of  the  nucleic  acid  cleft.  When  a  DNA  molecule  is  placed  in  the  cleft,  the  clamp 
in  the  elongation  complex  comes  into  direct  contact  and  would  prevent  it  from 
disengaging.  This  is  crucial  since  polymerase  needs  to  be  highly  processive,  yet  retain  the 
DNA  for  thousands  of  bases.  In  addition  the  clamp  allows  for  some  "breathing"  (flexibility) 
which  is  necessary  for  moving  rapidly  along  the  DNA  template. 

The  results  however  were  inconclusive  as  to  the  exact  location  of  the  nucleic  acids.  It 
appears  that  there  may  be  some  movement  of  the  polymerase  on  the  DNA.  As  we  have 
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observed  from  the  biochemistry  in  this  report,  polymerase  is  quite  dynamic.  It  is  highly 
mobile  on  its  template.  If  there  are  multiple  conformations  then  it  would  be  difficult  to 
directly  visualize  tire  nucleic  acids.  Employing  a  refined  model  for  molecular  replacement 
will  allow  for  the  highly  improved  generation  of  phase  data  that  could  allow  for  nucleic 
acid  location  with  the  current  data.  This  should  be  shortly  at  hand.  Secondly,  In  last  years 
report,  efficient  generation  of  elongation  complexes,  were  observed  employing  Mercuri- 
CTP  and/or  Brominated  DNA  templates.  Since  the  zinc  anomalous  signal  was  easily 
detected  using  phases  from  the  molecular  replacement  with  the  mainchain  model,  it  is 
highly  likely  that  the  location  of  the  DNA  and  RNA  can  be  found  using  an  anomalous 
signal  generated  by  the  use  of  nucleic  acids  containing  Hg  or  Br. 

3.  Determining  the  structure  of  DNA  sequences  in  the  ternary  complex  caused  by  intrinsic 
pausing,  a  point  of  cellular  regulation  of  elongation  complexes 

Intrinsically  paused  complexes  are  complexes  that  even  in  the  presence  of  aU  four 
nucleotides  are  arrested  and  imable  to  elongate  their  RNA  chain.  Indeed  in  this  report  we 
observe  that  employing  well  designed,  templates  nearly  aU  the  complex  can  arrest.  The 
sequence  involved  in  inducing  the  arresting,  is  the  poly  T  tracts  added  immediately  after 
the  pause  site  (15).  It  does  not  appear  though  that  arresting  occurs  at  a  single  base  in  the 
poly  T  track  but  rather  at  ~3  different  residues  that  are  in  sequence.  An  important 
conclusion  is  that  a  new  system  needs  to  be  defined  to  allow  crystallization  of  intrinsically 
arrested  polymerase.  Indeed,  it  may  be  very  challenging  to  develop  such  a  system. 

4.  Co-crystallization  of  ternary  complex  with  TFIIS.  one  of  the  proteins  that  regulates 
elongation  at  pause  sites 

TEES  is  and  elongation  factor  that  causes  RNA  polymerase  II  to  cleave  a  small 
portion  of  RNA  in  arrested  elongation  complexes  in  a  mechanism  which  allows 
polymerase  to  read-through  the  arrest  site.  During  the  second  year  of  research  under 
support  of  this  grant,  co-crystals  of  RNA  polymerase  II  were  grown  and  diffracted.  60%  of 
a  complete  data  set  to  3.6A  was  collected.  Since  then,  it  has  been  noted  that  these  crystals 
are  quite  anisotropic  and  30-40%  of  the  diffraction  was  limited  to  SA.  Currently  more 
crystals  are  being  grown.  During  the  dmation  of  the  Breast  Cancer  Initiative  support,  co¬ 
crystals  of  TFIIS-Polymerase  were  grown  and  it  its  structural  determination  seems  quite 
promising. 
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Figures 


Figure  1  Oligonucleotides  employed  for  generating  elongation  complexes 

Pause  Site 
* 

9  Pause  AAGACCAGGCATTTTTTCTTGTTGCGGAAGGGG 

CCCCCCCCCCCCTTCTGGTCCGTAAAAAAGAACAACGCCTTC 
- Tail -  - Upstream  Region - 

1 0  Pause  AAGACCAGGCATTTTTTCTTGTTGCGGAAGGGG 

CCCCCCAAACCCTTCTGGTCCGTAAAAAAGAACAACGCCTTC 

1 iPause  AAGACCAGGCATTTTTTCTTGTTGCGGAA 

CCCCCCCCCCCCTTCTGGTCCGTAAAAAAGAACAACGCCTT 

12 Pause  CACAC 

AAGACCAGGCATTTTTTCTTGTTGCGGAA 

CCCCCCAAACCCTTCTGGTCCGTAAAAAAGAACAACGCCTT 


13 Pause  CACAC 

AAGACCAGGCATTTTTTCTTGTTGCGGAA 

CCCCCCCCCCCCTTCTGGTCCGTAAAAAAGAACAACGCCTT 


14 Pause  CACAC 

AAGACCAGGCATTTTTTCTTGTTGCGGAA 

CCCCCCCAAACCTTCTGGTCCGTAAAAAAGAACAACGCCTT 

15 Pause  CACAC 

CCGACCAGGCATTTTTTCTTGTTGCGGAA 

CCCCCCAAACCCGGCTGGTCCGTAAAAAAGAACAACGCCTT 


17 Pause  Pause  Site 

GGC  * 

AAGACCATTCGGCGAAGAACAAGCAA 
CCGGTCTAAAACTTCTGGTAAGCCGCTTCTTGTTCGTT 
RNA  CCAGATTTT  --Upstream  Region — 
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Figure  2.  Crystallographic  data  for  yeast  RNA  polymerase  II  and  its 
complexes 
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Figure  3.  Elongation  Complex  Omit  Map  Confirms  Clamp  Domain  Movement 


The  bulk  of  the  RNA  Polymerase  II  mainchain  model  is  depicted  in  red  as  a  stick  model  and  the  clamp  domain  of  the  model  is 
depicted  as  a  yellow  ribbon.  The  blue  elctron  density  of  the  elongation  complex  Omit  map  (see  text)  in  the  region  of  the  clamp 
domain  is  also  shown.  The  position  of  the  clamp  domain  in  the  model  (A)  does  not  fit  the  elongation  complex  electron  density 
map,  whereas  the  elongation  complex  zinc  aligned  clamp  domain  (B)  does  fit  into  the  electron  density  (see  text). 


Figure  4.  Small  Domain  is  a  DNA  Clamp  in  the  Elongation  Complex 


The  bulk  of  the  RNA  Polymerase  II  mainchain  model  is  depicted  in  red  as  a  stick  model  and  the  clamp  domain  of  the  model  is 
depicted  as  a  yellow  ribbon.  A  B-form  double  stranded  DNA  in  green  was  placed  in  the  proposed  DNA  biniding  cleft.  In  A,  the 
clamp  domain  is  positioned  as  it  is  in  the  original  mainchain  model  and  in  B,  the  position  of  the  clamp  is  that  of  the  elongation 
complex  C2  crystal. 


Figure  5.  RNA  Polymerase  il  is  in  dynamic  motion,  sliding 
at  the  site  of  initiation  on  tailed  templates. 

Gel  A  Gel  B 

9P  10P  11P 
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9P  10P  12P 

:i  ■■■■ 

P  Ea  P  Ea  P  Ea 


Tailed  templates  9-13Pause  (9P-13P)  were  employed  for  transcription  reactionsas  previously 
described  (15),  and  paused  in  the  presence  of  A,C  and  GTP  by  withholding  UTP  (lanes  P). 
Transcripts  were  elongated  by  adding  UTP  before  pausing  (lanes  Eb)  or  15  minutes  after 
pausing  (lanes  Ea).  RNA  in  gels  migrate  from  top  to  bottom.  In  GelA  no  significant  difference 
is  observed  when  adding  UTP  before  or  after  pausing. 
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Figure  6.  RNA  Polymerase  II  can  be  made  to  pause 
at  a  single  base  on  tailed  templates. 
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Tailed  templates  12-15Pause  and  17Pause  (shown  in  figure  1)  employed  for 
transcription  reactions  as  in  figure  5,  were  paused  in  lanes  P  and  elongated  in 
lanes  E.  Template  17Pause  contains  a  nine  base  RNA  primer  whereas  template 
17P  -RNA  does  not  contain  the  RNA  primer.  To  prevent  readthru  while  pausing  by 
residual  contaminating  UTP,  or  misincorporation  of  nucleotides,  the  RNA  chain 
terminator  deoxyUTP  was  employed  (dU). 
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Figure  7.  Consistent  Better  Quaiity  Diffraction 


Figure  8. 1222  Elongation  Complex  crystals  contain 
polymerse  that  migrates  as  an  intact  elongation  complex 


PTE 


Polymerase  (P),  Binary  complex  consisting  of  polymerase  and  template  17Pause  (B),  and 
polymerase  from  elongation  complex  I222  crystals  generated  with  template17Pause  (E)  were 
were  electrophoretically  separated  in  a  non-denaturing  nusieve-agarose  gel  as  previously 
described  (15).  The  nearly  all  the  elongation  complex  polymerase  migrates  faster  than 
Polymerase  alone  or  Binary  complex.  This  confirms  that  the  polymerse  in  the  I222  crystals 
maintains  an  elongation  state  prior  to  the  freezing  of  crystals. 
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Architecture  of  RNA 
Polymerase  II  and  Implications 
for  the  Transcription 
Mechanism 

Patrick  Cramer,''  David  A.  Bushnell,''  Jianhua  Fu,'' 

Averell  L  Cnatt,'*  Barbara  Maier-Davis,''  Nancy  E.  Thompson,^ 
Richard  R.  Burgess,^  Aled  M.  Edwards,^  Peter  R.  David,'* 
Roger  D.  Kornberg^* 

A  backbone  model  of  a  10-subunit  yeast  RNA  polymerase  II  has  been  derived 
from  x-ray  diffraction  data  extending  to  3  angstroms  resolution.  All  10  subunits 
exhibit  a  high  degree  of  identity  with  the  corresponding  human  proteins,  and 
9  of  the  10  subunits  are  conserved  among  the  three  eukaryotic  RNA  poly¬ 
merases  1,11,  and  III.  Notable  features  of  the  model  include  a  pair  of  jaws,  formed 
by  subunits  Rpbl ,  RpbS,  and  Rpb9,  that  appear  to  grip  DNA  downstream  of  the 
active  center.  A  clamp  on  the  DNA  nearer  the  active  center,  formed  by  Rpbl, 
Rpb2,  and  Rpb6,  may  be  locked  in  the  closed  position  by  RNA,  accounting  for 
the  great  stability  of  transcribing  complexes.  A  pore  in  the  protein  complex 
beneath  the  active  center  may  allow  entry  of  substrates  for  polymerization  and 
exit  of  the  transcript  during  proofreading  and  passage  through  pause  sites  in 
the  DNA. 


RNA  polymerase  11  (pol  II),  the  central  en¬ 
zyme  of  gene  expression,  synthesizes  all 
messenger  RNA  in  eukaryotes.  The  intricate 
regulation  of  pol  II  transcription  underlies 
cell  growth  and  differentiation.  The  size  and 
complexity  of  pol  II  befit  this  important  role. 
The  best  characterized  form  of  the  enzyme, 
that  from  the  yeast  Saccharomyces  cerevi- 
siae,  comprises  12  different  polypeptides, 
with  a  total  mass  of  about  0.5  megadaltons 
(MD)  (Table  1).  The  human  enzyme  must  be 
virtually  identical,  as  the  human  genes  for  all 
subunits  show  a  high  degree  of  sequence 
conservation  (Table  1),  and  at  least  10  mam¬ 
malian  pol  II  genes  can  be  substituted  for 
their  counterparts  in  yeast  (7). 

Pol  II  is  the  core  of  the  transcription  ma¬ 
chinery.  On  its  own,  it  can  unwind  the  DNA 
double  helix,  polymerize  RNA,  and  proof¬ 
read  the  nascent  transcript.  In  the  presence  of 
additional  proteins,  it  assembles  even  larger 
initiation  and  elongation  complexes,  capable 
of  promoter  recognition  and  response  to  reg¬ 
ulatory  signals.  A  regulated  initiation  com¬ 
plex  comprises  pol  II,  five  general  transcrip¬ 
tion  factors,  and  a  multiprotein  Mediator  (2- 
4).  It  contains  some  60  proteins,  with  a  total 
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mass  of  3.5  MD.  In  transcription  elongation 
complexes.  Mediator  and  some  of  the  general 
transcription  factors  are  replaced  by  SII 
(TFIIS),  Elongator,  other  elongation  factors, 
and  RNA  processing  proteins  (i,  5,  6). 

Determination  of  molecular  models  for 
the  pol  II  transcription  machinery  has  so  far 
been  limited  to  a  half  dozen  of  the  smallest 
proteins  and  protein  fragments  {7-17).  De¬ 
tailed  structural  studies  of  the  larger  proteins 
and  multiprotein  complexes,  essentia]  for  un¬ 
derstanding  the  mechanism  and  regulation  of 
transcription,  pose  a  more  formidable  chal¬ 
lenge.  We  report  here  the  x-ray  analysis  of  a 
10-subunit  yeast  pol  II.  As  nine  of  the  sub¬ 
units  are  conserved  among  RNA  polymerases 
I,  II,  and  III  {18),  our  findings  provide  a  basis 
for  understanding  the  entire  eukaryotic  tran¬ 
scription  machinery.  They  suggest  roles  for 
each  of  the  many  subunits  and  give  insight 
into  the  remarkable  features  of  the  transcrip¬ 
tion  mechanism. 

Our  investigation  stemmed  originally  from 
the  development  of  a  yeast  cell  extract  capable 
of  accurately  initiated  pol  II  transcription  {19) 
and  the  development  of  a  general  method  of 
forming  single-layer  [two-dimensional  (2D)] 
protein  crystals  {20).  An  active  extract  opened 
the  way  to  the  isolation  of  functional  pol  II  {21), 
whereas  the  2D  crystallographic  approach  ex¬ 
tended  the  reach  of  structure  determination  to 
such  scarce,  large,  fragile  multiprotein  com¬ 
plexes.  The  first  2D  crystallization  trials  gave 
crystals  too  small  and  too  poorly  ordered  for 
structure  determination  {21).  However,  the  ease 
and  small  amount  of  material  required  for  2D 


crystallization  allowed  its  use  as  a  stmctural 
assay  to  guide  the  preparation  of  pol  II  that 
would  form  better  crystals.  It  soon  emerged  that 
heterogeneity  of  pol  II,  owing  to  substoichio- 
metric  levels  of  two  small  subunits,  Rpb4  and 
Rpb7,  was  an  impediment  to  ciystallization. 
The  problem  was  solved  by  the  isolation  of  pol 
II  from  an  RPB4  deletion  strain  of  yeast,  yield¬ 
ing  a  “deletion”  enzyme  lacking  both  Rpb4  and 
Rpb7,  which  together  account  for  only  8%  of 
the  mass  of  the  wild-type  protein.  The  deletion 
enzyme,  unimpaired  in  transcription  elongation 
and  also  fully  active  in  transcription  initiation 
when  supplemented  with  the  missing  subunits 
(22),  formed  exceptionally  large,  well-ordered 
2D  crystals  {23).  Structures  of  pol  II  alone,  and 
complexed  with  general  transcription  factors 
and  nucleic  acids,  were  determined  by  3D  re¬ 
construction  from  electron  micrographs  of  2D 
crystals  to  about  15  A  resolution  {24-27).  In 
the  course  of  this  work,  it  became  apparent  that 
even  at  the  low  protein  concentration  used  for 
2D  crystallization,  typically  about  0.1  mg/ml, 
there  was  a  tendency  of  the  crystals  to  grow 
epitaxially,  adding  additional  layers  in  register 
with  the  first  (23).  This  tendency  was  exploited 
by  the  use  of  2D  crystals  as  seeds  for  growing 
3D  crystals  (25),  which  are  now  readily  ob¬ 
tained  by  conventional  methods  as  well. 

X-ray  diffraction  from  3D  crystals  of  pol  II 
was  initially  undetectable.  The  problem  proved 
to  be  oxidation.  Maintenance  of  an  inert  atmo¬ 
sphere  during  the  final  stages  of  protein  purifi¬ 
cation  and  throughout  crystal  growth,  as  well  as 
improvements  in  crystallization  conditions,  en¬ 
abled  the  collection  of  diffraction  data  to  3.5  A 
resolution  {29).  Because  of  the  great  size  of  the 
protein  and  unit  cell,  only  large  heavy  atom 
clusters,  such  as  an  18 -tungsten-atom  cluster, 
could  be  used  for  initial  phase  determination. 
The  validity  of  the  initial  phases  was  shown  by 
a  close  fit  of  the  electron  density  map  computed 
at  6  A  resolution  to  the  pol  II  map  from  2D 
crystallography  {29).  There  was  only  one  devi¬ 
ation  between  the  two  maps,  which  was  attrib¬ 
uted  to  movement  of  a  protein  domain,  suggest¬ 
ed  to  clamp  nucleic  acid  in  a  transcribing  com¬ 
plex  (29). 

With  a  6  A  phase  set,  it  should  have  been 
possible  to  locate  individual  heavy  atoms  in 
isomorphous  derivatives  and  to  extend  stmcture 
determination  to  higher  resolution.  There  were, 
however,  three  major  obstacles.  First,  diffrac¬ 
tion  to  3.5  A  resolution  could  not  be  obtained 
reproducibly.  Second,  the  crystals  were  noniso- 
morphous,  varying  by  as  much  as  10  A  in  one 
dimension  of  the  unit  cell.  Very  few  crystals 
could  be  derivatized  and  matched  with  an  iso¬ 
morphous  native  crystal.  Because  of  the  low 
abundance  of  pol  II,  approximately  10,000  li¬ 
ters  of  cell  culture  had  to  be  processed  to  obtain 
the  6  A  electron  density  map,  and  far  more 
would  have  been  required  for  extension  to  high 
resolution.  The  final  obstacle  was  that  heavy 
atom  compounds  commonly  used  for  protein 
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phase  determination  destroyed  diffraction  from 
the  crystals. 

A  crystallographic  backbone  model  for 
RNA  polymerase  II.  These  difficulties  were 
overcome  in  the  present  work  by  a  soaking 
procedure  that  shrank  the  crystals  to  an  ^parent 
minimum  of  the  variable  unit  cell  dimension 

(30) .  The  resulting  crystals  were  isomorphous 
and  diffracted  isotropically  to  3.0  A  resolution 

(31) .  Because  the  improved  crystals  were  non- 
isomorphous  with  the  original  crystals,  initial 
phases  were  redetermined  by  multiple  anoma¬ 
lous  dispersion  (MAD)  with  a  six-tantalum- 
atom  cluster  derivative,  which  showed  a  single 
peak  in  difference  Pattersons  (Fig.  1)  (32). 
These  phases  sufficed  to  reveal  individual  heavy 
atoms  in  other  crystals  by  means  of  cross-dif¬ 
ference  Fouriers  (Fig.  1)  (33).  An  extensive 
search  identified  nonstandard  mononuclear 
heavy  atom  compounds  that  gave  useful  deriv¬ 
atives  (Table  2)  (34).  Phases  were  determined 
by  multiple  isomorphous  replacement  with 
anomalous  scattering  (MIRAS)  from  10  data 
sets,  ranging  from  4.0  to  3.1  A  resolution 
(Table  2)  (35).  The  resulting  molecular  enve¬ 
lope  was  in  good  agreement  with  that  previ¬ 
ously  obtained  at  6  A  resolution  (29).  After 
solvent  flattening,  an  electron  density  map 
was  obtained  that  revealed  the  course  of  the 
polypeptide  chain  and  many  amino  acid  side 
chains  (Fig.  2)  (36). 

Available  structures  of  pol  II  subunits  and 
subunit  fragments,  comprising  14%  of  all  pol  11 
amino  acid  residues,  were  manually  fit  into  the 
electron  density  (37).  The  complete  structures 
of  yeast  Rpb5  and  Rpb8  were  used,  whereas 
structures  of  Escherichia  coli  and  archaebacte- 
rial  homologs  of  yeast  Rpb3,  6,  9,  10,  and  11 
were  truncated  to  the  conserved  regions  (Table 


1).  In  all  cases,  a  unique  fit  of  the  subunit  fold 
to  regions  of  the  electron  density  map  was 
observed.  Subunit  placement  was  facilitated  by 
the  location  of  eight  zinc  ions,  revealed  by  a 
zinc  anomalous  difference  Fourier  (Fig.  1  and 
Table  1).  Most  parts  of  the  yeast  subunits  miss¬ 
ing  from  the  homologous  proteins  could  be 
modeled  as  polyalanine  into  adjacent  regions  of 
electron  density.  The  remaining  density,  about 
70%  of  the  total  volume,  was  attributed  to  the 
two  large  subunits,  Rpbl  and  Rpb2,  with  a 


minor  contribution  from  the  smallest  subunit, 
Rpbl2.  It  was  modeled  as  polyalanine  frag¬ 
ments,  with  the  use  of  standard  secondary  struc¬ 
ture  elements  wherever  possible.  Combination 
of  phases  from  MIRAS  and  an  initial  polyala¬ 
nine  model  resulted  in  an  improved  map,  which 
allowed  adjustment  and  extension  of  the  model 
(38).  The  polyalanine  fragments  were  assigned 
to  Rpbl  or  Rpb2  on  the  basis  of  (i)  the  location 
of  the  active-site  metal  bound  by  Rpbl  (see 
below);  (ii)  two  zinc-binding  motifs  in  the  NHj- 
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Zinc  anomalous  difference  Fouriers 


Tantalum  difference  Pattersons 
Isomorphous  anomalous 


Tantalum  MAD  phases  Final  MIRAS  phases 


Fig.  1.  Localization  of  heavy  atoms.  (A)  Marker  sections  of  Isomorphous  and  anomalous  difference 
Patterson  maps  of  the  tantalum  cluster  derivative  (Table  2).  A  single  peak  at  the  same  position  in 
the  two  maps  is  observed.  Heights  of  the  Marker  peaks  in  the  isomorphous  and  anomalous 
difference  Pattersons  were  6  a  and  5  a,  respectively.  The  resolution  range  of  the  data  used  is  40 
to  5.5  A.  The  contour  levels  are  3  a  (background)  and  1  a  (steps).  (B)  Anomalous  difference  Fourier 
calculated  with  native  data  collected  at  the  zinc  anomalous  peak  energy  using  initial  tantalum  MAD 
phases  (left)  and  final  MIRAS  phases  (right).  The  projection  of  one  asymmetric  unit  along  the  z  axis 
is  shown  for  tantalum  and  MIRAS  phases  at  a  contour  level  of  3  a  and  7  a,  respectively,  with  1  a 
steps.  The  eight  strong  peaks  correspond  to  structural  zinc  atoms  (Table  1).  The  ninth  peak  corresponds 
to  the  active  site  metal  and  likely  arises  from  partial  replacement  of  magnesium  by  zinc. 


Table  1.  Yeast  RNA  polymerase  II  subunits. 


Residues  in 
se<}uence 

Identity  to 
human  (%)' 

Structure  used  in  modeline 

-  Residues  in 
model  (%) 

Surface 

^steines* 

Subunit 

(kD) 

Organism 

Protein 

Method 

PDB 

code 

Refer¬ 

ence 

Conserved 

residues^ 

Zinc  site  (a)* 

Rpb1 

191.6 

1733 

52 

1213(84)’ 

Zn6(23.2). 

(1449)’ 

Zn8(19.3), 
"Zn9-  (9.8)’ 

Rpb2 

138.8 

1224 

61 

949  (78) 

Zn7(23.1) 

Rpb3 

35.3 

318 

46 

£  coli 

a 

X-ray 

Ibdf 

[69) 

8-69, 162- 
180, 233- 
251 

264  (83) 

Zn2  (30.2) 

Cys207 

Rpb4 

25.4 

221 

30 

RpbS 

25.1 

215 

45 

S.  cerevisiae 

RpbS 

X-ray 

Idrf 

(42) 

211  (98) 

Cys83 

Rpb6 

17.9 

155 

59 

Human 

RPABC14.4 

NMR 

Iqkl 

(f^) 

79-154 

140  (96) 

RpbZ 

19.1 

171 

61 

Rpb8 

16.5 

146 

43 

5.  cerevisiae 

RpbS 

NMR 

laid 

{73) 

114(78) 

Cy524, 

Rpb9 

Cys36 

143 

122 

37 

Thetmococcus  celer 

Rpb9  COOH- 

NMR 

Iqyp 

{74) 

67-112 

117(96) 

Zn3  (27.5), 

term,  domain 

Zn4  (26.4) 

Rpb10 

8.3 

70 

73 

Methanobactcrium 

thermoautotrophicum 

Rpb10  homolog 

NMR 

1ef4 

{B7) 

3-55 

65  (93) 

Znl  (31.9) 

Rpbll 

13.6 

120 

so 

£  coli 

a 

X-ray 

Ibdf 

{69) 

19-101 

110(92) 

Rpb12 

7.7 

70 

43 

36(51) 

Zn5  (24.7) 

Total 

513.6 

4565 

S3 

3219(83)’ 

8+1 

’Percentage  of  identical  amino  acid  residues,  for  Rpbl  excluding  the  COOH-terminal  domain.  ‘'Peaks  in  the  zinc  anomalous  difference  Fourier  shown  in  Fig.  1.  Peaks  are  numbered 
according  to  their  height  which  is  given  in  parentheses  in  multiples  of  the  standard  deviation.  Zn6  and  Zn8  are  located  in  the  NFlj-terminal  region  of  Rpbl.  Zn7  is  located  in  the 
COOFI-terminal  region  of  Rpb2.  Zn3  is  located  in  the  NH^-terminal  and  Zn4  in  the  COOH-terminal  domain  of  Rpb9.  ^These  exposed  cysteine  residues  coincide  with  mercury  sites 
in  two  independent  derivatives  [mercury,  Table  2,  and  ethylmercuryphosphate  (89)],  confirming  the  modeling  at  several  places.  Conserved  residues  (yeast  protein  numbering) 
to  which  the  model  structure  was  truncated  before  placement  in  the  electron  density.  frhe  numbers  in  parentheses  correspond  to  Rpbl  without  the  unstructured  COOH-terminal 
domain  (CTD).  ®The  ninth  peak  in  the  zinc  anomalous  difference  Fourier  corresponds  to  the  active  site  metal  and  likely  arises  from  partial  replacement  of  the  active  site  metal 
by  zinc.  ryhg  number  in  parentheses  corresponds  to  the  pol  II  mutant  used  in  structure  determination,  which  lacks  subunits  Rpb4  and  RpbZ. 
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terminal  region  of  Rpbl,  connected  by  a  linker 
of  appropriate  length;  (iii)  one  zinc  site  in  the 
COOH-terminal  region  of  Rpb2;  and  (iv)  cross- 
linking  of  Rpb5  to  the  COOH-terminal  region 
of  Rpbl  and  of  Rpb3  to  residues  901  to  992  of 
Rpb2  (59). 


The  current  backbone  model  comprises  8 
polyalanine  fragments  for  Rpbl,  10  fragments 
for  Rpb2,  and  major  portions  of  all  small  sub¬ 
units  (Table  1).  It  accounts  for  the  entire  mo¬ 
lecular  volume  observed  in  the  erystals  and 
contains  3219  residues,  about  83%  of  the  total. 


assuming  all  residues  are  ordered  except  the 
COOH-terminal  domain  of  Rpbl.  Building  of 
an  atomic  model  is  well  advanced. 

General  architecture  and  DNA  binding. 
The  two  largest  subunits,  Rpbl  and  Rpb2,  form 
distinct  masses  with  a  deep  eleft  between  them 


Fig.  2.  Subunit  struc¬ 
tures  determined  pre¬ 
viously  or  rebuilt  here 
fitted  to  the  experi¬ 
mental  pol  ii  electron 
density.  The  solvent- 
flattened  MIRAS  elec¬ 
tron  density  map 
(blue)  is  contoured  at 
1.0  cr.  Experimental 
phases  in  the  resolu¬ 
tion  range  40  to  3.1  A 
were  used  to  calculate 
the  map.  in  (A)  and 

(B) ,  the  map  was  fil¬ 
tered  with  program 
MAPMAN  to  reduce 
noise  (84).  This  map 
facilitated  fold  recog¬ 
nition  but  appears  to 
be  at  lower  resolution, 
and  side  chain  density 
is  largely  removed.  In 

(C) ,  the  original  map  is 
shown,  which  is  noisi- 


RpbS 


er  but  reveals  many 

details.  (A)  Ca  model  of  RpbS  [black  (47)]  fitted  to  the  density  (blue).  A 
loop  that  is  involved  in  packing  against  Rpbl  is  in  a  different  conforma¬ 
tion  in  pol  ll  than  in  the  structure  of  free  RpbS  (orange).  Peaks  of 
anomalous  difference  Fourier  transforms  of  two  mercury  derivatives 
(pink,  yellow,  both  contoured  at  5  a)  coincide  with  the  position  of  Cys83. 
(B)  Ca  traces  of  the  NMR  structure  of  the  RpblO  homolog  from  Meth- 
anobacterium  thermoautotropbicum  [orange  (87)]  fitted  to  the  density 
(blue)  and  the  rebuilt  backbone  model  for  yeast  RpblO  (black).  The 


location  of  the  zinc  ion  in  the  NMR  structure  coincides  with  a  strong 
peak  in  the  zinc  anomalous  Fourier  (pink,  contoured  at  7  a).  (C)  One 
of  the  p  strands  in  Rpbll  (black,  residues  68  to  7S)  fitted  to  the 
density  (blue).  Distinct  electron  density  is  present  for  several  side 
chains.  The  model  was  obtained  by  placing  the  conserved  core  of  f. 
coli  a  (69)  and  replacing  the  side  chains  with  those  in  yeast  Rpbll 
using  the  most  common  rotamer.  This  figure  was  prepared  with 
BOBSCRiPT  (85)  and  MOLSCRiPT  (86). 


Table  2.  Data  collection  and  MIRAS  phasing. 


"•’®y  1^  Resolution 

ur«'  (A) 

asine.1222.  131x225x370 A 


Data  coupon  and  MIRAS  phasing.  1222,  131 x 225. 
Native  (inc)  SSRL  1.283 

Tantalum,*  peak  SSRL  1.2551 

Tantalum,  inflection  SSRL  1.2553 

Tantalum,  remote  SSRL  1.3776 

Iridium-la  ALS  1.105 

Iridium-lb  SSRL  1.106 

lridlum-2  SSRL  1,107 

Mercury-a  SSRL  1.009 

Mercury-b  SSRL  1.009 

Rhenium-1  SSRL  1.181 

Rhenium-2a  ALS  1.176 

RheniurTv-2b _ ALS  1.176 _ 

MIRAS  Figure  of  merit  (FOM)  with  resolution _ 

Resolution  (A)  39.24-8.58  8.58-6.14 

FOM  (centrics)  0.616  0.690 

FOM  (acentrifa) _ 0.801 _ 0.810 


6.14-5.03 
0.689  ' 

0.743 


Unique 

reflections* 


98,315  (9,073) 
64,756  (5.724) 
61,506  (5,682) 
64,624  (5,808) 
89,734  (8,869) 
80,297  (6.397) 
46,373  (4,540) 
90,934(9,064) 
55,003  (5,143) 
45,791  (4,421) 
89,814(8,818) 
79,025  (6,820) 

3  5.03^37 

0.609 

_ 0.621 


Complete¬ 
ness  (%)* 


99.2  (92.7) 
92.9(82.9) 

88.3  (82.3) 
92.8(84.2) 

99.5  (99.6) 

96.6  (79.0) 

99.6  (99.5) 

98.4  (99.0) 

99.4  (943) 

98.4  (96.9) 
99.6  (99.0) 
96.0  (83.6) 

4.37-3.91 

0.562 

0.555 


8.4  (29.8) 

8.2  (25.7) 
73(27.3) 

6.6  (31.3) 

6.3  (30.3) 
5.5(21.4) 
7.5(24.7) 

6.9  (33.8) 

8.9  (25.6) 

9.6  (24.2) 
8.0  (34.3) 

6.4  (33.0) 


Soaking  time 
and  concen¬ 
tration 


22  h  ImM 
5h  5mM 
22  h  ImM 
4h  1  mM 
6h  1  mM 
5h  20mM 
5h  lOmM 
5h  SmM 


0.76/0.91 

0.76/0.77 

0.96/0.98 

0.92/0.98 

0.83/0.94 

0.99/0.90 

0.90/0.92 

0.88/0.62 


-/0.79 

0.97/1.06 


1.69/0.94 

1.76/1.13 

0.59/0.50 

0.76/0.71 

0.99/0.38 

0.70/0.99 

0.87/1.03 

0.90/0.88 


3.91-3.57 

0.557 

0.514 


3.57-3.31 

0.524 

0.440 


331-3.10 

0.255 

0.192 


Overall.  40-3.1 

0.565 

0.529 


Numbers  following  the  element  names  indicate  different  heavy  atom  compounds.  Lowercase  letters  indicate  different  soaking  concentrations  or  soaking  times,  leading  to  differences 
in  the  numbers  and  occupancies  of  heavy  atom  sites.  Although  data  sets  from  derivative  pairs  obtained  in  this  way  were  correlated,  additional  phase  information  could  be  extracted 
that  proved  crucial  for  obtaining  an  interpretable  elertron  density  map.  The  heavy  atom  compounds  used  were  as  follows:  tantalum.  TagBr„^-^;  iridlum-l,  chloro-pentamethylcy- 
clopentadienyl-1,2-bis(diphenylphosphino)ethane-iridium  chloride;  iridium-2,  pentamethylcyclopentadienyl-iridiumchloride  dimer;  mercury,  Hg3N3C.gO,2H24,  a  1, 3,  5-triazine-based 
compound.  Although  the  same  compound,  methyltrioxorhenium.  was  used  for  rhenium-1  and  rhenium-2,  the  observed  binding  sites  differ,  leading  to  independent  derivatives.  We 
^lieve  that  the  compound  was  altered  with  time  in  solution  leading  to  a  different  chemical  specificity.  Tantalum  and  iridium-2  derivatives  were  found  previously  (29).  and  gave 
diffraction  to  higher  resolution  in  this  study.  ^SSRL,  beamline  9-2  at  the  Stanford  Synchrotron  Radiation  Facility;  ALS.  beamline  5.0.2  at  the  Advanced  Light  Source  at  Berkeley 
Statistics  for  the  highest  resolution  shell  are  given  in  parentheses.  -^Mosalcity  was  refined  with  SCALEPACK  (82).  =  2,  J/(/»  -  </(h))|/X,  J/(//i)|.  where  (/(/r))  is  the  mean 

of  the  /  observations  of  reflection  h.  was  calculated  with  anomalous  pairs  merged;  no  sigma  cut-off  was  applied.  =  i'somorphous  difference  =  X\Fp„  -  where 

fpH  and  fp  are  the  derivative  and  native  structure  factor  amplitudes,  respectively.  ^/?coUis'  i^ean  lack  of  closure  divided  by  the  mean  isomorphous/anomalous  difference.  Phasing 
giwer,  mean  value  of  heavy  atom  structure  factor  amplitudes  divided  by  the  lack  of  closure.  The  numbers  given  are  for  acentric  reflections.  These  statistics  were  calculated  with  SHARP 
(85).  Owing  to  random  orientation  of  the  cluster,  it  was  treated  as  a  point  scatterer  and  data  were  used  to  only  4.5  A  resolution.  The  MAD  data  were  used  for  initial  phasing 
but  only  the  peak  wavelength  data  were  used  in  the  final  MIRAS  phasing.  * 
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(Fig.  3).  Each  of  the  small  subunits  occurs  in  a 
single  copy,  arrayed  around  the  periphery.  The 
structure  is  cross-strutted  by  elements  of  Rpbl 
and  Rpb2  that  traverse  the  cleft:  A  helix  of 
Rpbl  bridges  the  cleft,  and  the  COOH-terminal 
region  of  Rpb2  extends  to  the  opposite  side. 
The  Rpbl-Rpb2  complex  is  anchored  at  one 
end  by  a  subassembly  of  Rpb3,  RpblO,  Rpbl  1, 
and  Rpbl2. 

The  active  site  was  located  crystallographi- 
cally  by  replacement  of  the  catalytic  Mg^"^  ion 
with  Zn^"*^,  Mn^"^,  or  Pb^"^  (40).  A  native  zinc 
anomalous  Fourier  showed  a  10-a  peak  that 
likely  results  from  partial  replacement  of  the 


active  site  Mg^"^  by  Zn^'*'  during  protein  puri¬ 
fication  (Fig.  1),  and  difference  Fouriers  ob¬ 
tained  from  crystals  soaked  with  either  Mn^"^  or 
Pb^"^  showed  a  single  peak  at  the  same  location 
(41).  The  metal  ion  site  occurs  within  a  prom¬ 
inent  loop  of  Rpbl  (Fig.  3),  which,  on  the  basis 
of  preliminary  sequence  assignment,  harbors 
the  conserved  aspartate  residue  motif  (42). 
Only  one  catalytic  metal  ion  was  found,  and 
only  one  was  reported  for  a  bacterial  RNA 
polymerase  (43),  although  a  two-metal  ion 
mechanism,  as  described  for  single-subunit 
polymerases  (44),  is  not  ruled  out. 

The  location  of  duplex  DNA  downstream  of 


the  active  site  (ahead  of  the  transcribing  poly¬ 
merase)  was  previously  determined  by  differ¬ 
ence  2D  crystallography  of  an  actively  tran¬ 
scribing  complex  (27).  Canonical  B-form  DNA 
placed  in  this  location  lies  in  the  Rpbl-Rpb2 
cleft,  and  can  follow  a  straight  path  to  the  active 
site  (Fig.  3).  About  20  base  pairs  are  readily 
accommodated  between  the  edge  of  the  poly¬ 
merase  and  the  active  site,  consistent  with  nu¬ 
clease  digestion  studies  showing  the  protection 
of  about  this  length  of  downstream  DNA  (45). 
This  proposal  for  the  pol  II-DNA  complex  is 
also  consistent  with  results  of  protein-DNA 
cross-linking  experiments:  Rpbl  and  Rpb5 


Fig.  3.  Architecture  of  yeast  RNA  polymerase  II.  Backbone  models  for 
the  10  subunits  are  shown  as  ribbon  diagrams.  Secondary  structure 
has  been  assigned  by  inspection.  The  three  views  are  related  by  90° 
rotations  as  indicated.  Downstream  DNA,  though  not  present  in  the 
crystal,  is  placed  onto  the  ribbon  models  as  20  base  pairs  of  canonical 
B-DNA  (blue)  in  the  location  previously  indicated  by  electron  crys¬ 


tallographic  studies  (27).  Eight  zinc  atoms  (blue  spheres)  and  the 
active  site  magnesium  (pink  sphere)  are  shown  (Table  1).  The  box 
(upper  right)  contains  a  key  to  the  subunit  color  code  and  an  in¬ 
teraction  diagram.  The  same  views  and  color  coding  are  used  through¬ 
out  the  article.  This  and  other  figures  have  been  prepared  with 
RIBBONS  (87). 
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cross-link  to  one  side  of  the  DNA  and  Rpb2  to 
the  other;  and  in  the  case  of  Rpb5,  the  cross¬ 
links  are  located  about  5  to  15  base  pairs  down¬ 
stream  of  the  active  site  (46). 

Jaws  position  downstream  DNA.  Rpb5, 
and  regions  of  Rpbl  and  Rpb9  on  the  opposite 
side  of  the  Rpbl-Rpb2  cleft,  form  “jaws”  that 
appear  to  grip  the  DNA  (Fig.  4).  Both  the  upper 
and  lower  jaw  may  be  mobile,  opening  and 
closing  on  the  DNA.  Mobility  within  Rpb5  is 
suggested  by  comparison  with  the  x-ray  crystal 
stmcture  of  the  subunit  alone  (47).  There  was  a 
nearly  perfect  fit  of  the  subunit  structure  to  the 
corresponding  region  of  the  pol  II  electron  den¬ 
sity  map  (Fig.  2A),  except  for  a  change  in 
relative  orientation  of  the  NHj-  and  COOH- 
terminal  domains,  and  a  conformational  change 
of  a  loop  in  the  COOH-terminal  domain  (Fig. 
4B).  The  solvent-exposed,  NH^-terminal  do¬ 
main  (residues  1  to  142)  has  apparently  moved 
by  as  much  as  5  A  in  the  direction  of  DNA  in 
the  pol  II  cleft,  relative  to  the  position  in  Rpb5 
alone,  with  the  COOH-terminal  domain  (resi¬ 
dues  143  to  21 5)  held  fixed  against  the  body  of 
Rpbl  (Fig.  4B).  The  observed  position  of  the 
NH2-terminal  domain  in  pol  II  is  defined  by 
crystal  contacts. 


Residues  in  the  Rpb5  loops  facing  the 
DNA  are  conserved  (Fig.  4C).  Two  prolines 
that  are  strictly  conserved  present  their  side 
chains  to  the  DNA  with  a  spacing  and  relative 
orientation  appropriate  for  contacting  the 
DNA  backbone.  Proline  residues  have  been 
seen  to  interact  with  backbone  ribose  moi¬ 
eties  of  DNA  in  other  crystal  structures  (48, 
49).  Such  nonspecific  van  der  Waals  interac¬ 
tions  might  favor  a  particular  rotational  set¬ 
ting  of  the  DNA,  without  greatly  impeding 
the  helical  screw  rotation  required  to  propel 
the  DNA  toward  the  active  site  and  to  unwind 
it  for  transcription. 

Other  conserved  residues  of  Rpb5  are  lo¬ 
cated  in  the  tinker  between  the  NHj-  and 
COOH-terminal  domains  and  in  the  NH2- 
terminal  helix  (Fig.  4C).  Since  the  linker  is 
not  involved  in  subunit-subunit  interactions, 
conserved  residues  might  ensure  a  directed 
movement  of  the  NHj -terminal  domain.  Con¬ 
served  residues  in  the  NHj-terminal  helix 
form  a  positive  charge  cluster  that  is  too  far 
from  DNA  to  contact  it  directly,  but  might 
attract  it  through  long-range  interactions. 

Rpb5  is  likely  to  play  a  role  in  transcrip¬ 
tional  activation  (50).  The  NH2-terminal  do¬ 


main  of  Rpb5  binds  to  the  transactivation 
domain  of  the  hepatitis  B  virus  X  protein 
(51).  Another  Rpb5-interacting  protein  inter¬ 
feres  with  transactivation  (52).  Some  activa¬ 
tors  might  function  by  enhancing  jaw-DNA 
interaction,  thereby  stabilizing  transcription 
initiation  or  elongation  complexes. 

The  upper  jaw,  formed  by  regions  of 
Rpbl  and  Rpb9,  corresponds  with  a  domain 
previously  shown  to  be  mobile  by  2D  crys¬ 
tallography  (53).  Rpb9  is  composed  of  two 
zinc-binding  domains  separated  by  a  15-res¬ 
idue  linker.  A  stretch  of  the  linker  adds  a  p 
strand  to  a  sheet  in  the  Rpbl  region  of  the 
jaw.  Rpb9  therefore  buttresses  Rpbl,  possi¬ 
bly  constraining  mobility  of  the  jaw  and 
strengthening  its  grip  on  DNA.  Mutations  in 
Rpb9  alter  the  locations  of  transcription  start 
sites  (54-56),  which  might  be  explained  by  a 
diminished  grip  on  the  DNA,  or  alternatively, 
by  direct  Rpb9-DNA  interaction  before  entry 
of  the  DNA  into  the  Rpbl-Rpb2  cleft. 

A  clamp  retains  DNA.  A  second  mobile 
element  of  pol  II,  previously  revealed  by  low- 
resolution  structures  and  referred  to  as  a 
“hinged”  domain,  was  suggested  to  clamp  nu¬ 
cleic  acids  in  the  cleft  (29).  This  element,  here 
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Fig.  4.  Jaws.  (A)  Stereoview  of  structural  ele¬ 
ments  constituting  the  jaws  (left)  and  the  lo¬ 
cation  of  these  elements  within  pol  II  (right). 
(B)  Mobility  of  the  larger,  NH2-terminal  domain 
of  RpbS.  Backbone  models  of  free  RpbS  [gray 
(47)]  and  RpbS  in  pol  II  (pink)  are  shown  with 
their  smaller,  COOH-terminal  domains  super¬ 
imposed.  (C)  Conservation  of  amino  acid  resi¬ 
dues  of  RpbS. 
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termed  the  “clamp,”  comprises  NH^-terminal 
regions  of  Rpbl  and  Rpb6  and  the  COOH- 
terminal  region  of  Rpb2  (Fig.  5).  All  three 
polypeptides  enter  at  the  base  of  the  clamp  near 
the  active  site,  allowing  a  degree  of  conforma¬ 
tional  freedom  but  not  unrestricted  movement 
of  the  clamp.  Within  the  Rpb6  region,  17  out  of 
42  residues  are  negatively  charged,  forming  a 
cluster  near  the  bottom  of  the  clamp.  This  re¬ 
gion  of  Rpb6  is  also  phosphorylated  by  casein 
kinase  II,  suggesting  a  regulatory  role  (57). 

The  clamp  forms  one  side  of  the  Rpbl- 
Rpb2  cleft,  where  it  may  interact  with  the  DNA 
(and  the  DNA-RNA  hybrid,  see  below)  from 
the  active  site  to  about  15  residues  downstream. 
This  DNA  region  corresponds  with  a  double- 
stranded  DNA  binding  site,  3  to  12  residues 
downstream  of  the  active  site,  defined  by  bio¬ 
chemical  analysis  of  E.  coli  RNA  polymerase 
(58-60).  This  binding  site  was  referred  to  as  a 
“sliding  clamp”  because  of  its  importance  for 
the  great  stability  of  a  transcribing  complex  and 
processivity  of  transcription  (60).  Closure  of  the 
clamp  over  the  DNA  could  account  for  this 
stability.  Such  a  movement  of  the  NH^-terminal 
region  of  the  largest  subunit  was  inferred  from 
cross-linking  studies  of  the  E.  coli  enzyme  (58). 
Although  the  clamp  is  seen  here  in  an  open 
conformation,  it  is  involved  in  crystal  contacts 
and  the  observed  position  is  likely  determined 
by  the  crystal  lattice.  The  electron  density  in 
this  region  is  of  lower  quality  than  elsewhere  in 
the  map,  and  the  three  zinc  peaks  associated 
with  the  region  have  the  lowest  heights  (Zn6-8, 
Table  1),  also  consistent  with  mobility  of  the 
clamp. 

DNA-RNA  hybrid  binding  site,  RNA 
binding  site.  Transcribing  polymerases  have 
been  shown  to  harbor  an  unwound  region  of 
DNA,  or  “bubble,”  within  which  is  centered  a 
DNA-RNA  hybrid  of  8  or  9  base  pairs,  with 
the  3'  or  growing  end  of  the  RNA  at  the 
active  site  (Fig.  6A)  (60).  Linear  extension  of 
duplex  DNA  placed  in  our  crystallographic 
model,  to  accommodate  the  DNA-RNA  hy¬ 
brid,  is  impossible  because  of  an  element 
from  Rpb2  blocking  the  path  (Figs.  3,  4,  and 
6).  This  blocking  element  corresponds  with  a 
“wall”  of  density  previously  noted  in  the 
structure  of  bacterial  RNA  polymerase  (43). 
Because  of  the  wall,  and  because  the  active 
site  lies  well  beneath  the  level  of  the  down¬ 
stream  DNA,  the  DNA-RNA  hybrid  must  be 
tilted  relative  to  the  axis  of  the  downstream 
DNA  (dashed  line  in  Fig.  6C).  The  exact  ori¬ 
entation  of  the  hybrid  remains  to  be  determined. 

At  the  upstream  end  of  the  DNA-RNA  hy¬ 
brid  (5'  end  of  the  RNA,  remote  from  the  active 
site),  the  strands  must  separate.  Biochemical 
studies  show  that  the  RNA  strand  enters  a 
binding  site  on  the  protein,  extending  from 
about  10  to  20  nucleotides  upstream  of  the 
active  site  (61).  There  are  two  prominent 
grooves  in  the  pol  II  structure  exiting  the  hybrid 
binding  site,  each  of  which  could  accommodate 


one,  but  not  two,  nucleic  acid  strands.  One 
groove  winds  around  the  base  of  the  clamp 
(Fig.  7,  groove  1).  The  other  is  between  the 


lower  part  of  the  wall  and  Rpbl,  and  continues 
downward  between  Rpbl  and  Rpbll  (Fig.  7, 
groove  2).  We  favor  groove  1  as  the  RNA 
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Fig.  5.  Clamp.  Structural  elements  constituting  the  clamp  and  their  location  in  pol  ii  are  shown.  The 
COOH-terminal  region  of  Rpb2  and  the  NHj-terminal  region  of  Rpbl  bind  one  and  two  zinc  ions, 
respectively  (blue  spheres).  The  NHj-terminal  tail  region  of  Rpb6  extends  from  its  main  body  (at 
the  bottom  in  the  front  view)  into  the  clamp.  The  direction  of  movement  of  the  clamp  revealed  by 
comparison  with  electron  crystal  structures  (29)  is  indicated  (double-headed  red  arrow). 


Fig.  6.  Topology  of  the  polymerizing  complex,  and  location  of  Rpb4  and  Rpb7.  (A)  Nucleic  acid 
configuration  in  polymerizing  (top)  and  backtracking  (bottom)  complexes.  (B)  Structural  features  of 
functional  significance  and  their  location  with  respect  to  the  nucleic  acids.  A  surface  representation  of 
pol  ii  is  shown  as  viewed  from  the  top  in  Fig.  3.  To  the  surface  representation  has  been  added  the 
DNA-RNA  hybrid,  modeled  as  nine  base  pairs  of  canonical  A-DNA  (DNA  template  strand,  blue;  RNA, 
red),  positioned  such  that  the  growing  (3')  end  of  the  RNA  is  adjacent  to  the  active  site  metal  and 
clashes  with  the  protein  are  avoided.  The  exact  orientation  of  the  hybrid  remains  to  be  determined.  The 
nontemplate  strand  of  the  DNA  within  the  transcription  bubble,  single-stranded  RNA  and  the  upstream 
DNA  duplex  are  not  shown.  (C)  Cutaway  view  with  schematic  of  DNA  (blue)  and  with  the  helical  axis 
of  the  DNA-RNA  hybrid  indicated  (dashed  white  line).  An  opening  in  the  floor  of  the  cleft  that  binds 
nucleic  acid  exposes  the  DNA-RNA  hybrid  (pore  1)  to  the  inverted  funnel-shaped  cavity  below.  The 
plane  of  section  is  indicated  by  a  line  in  (B),  and  the  direction  of  view  perpendicular  to  this  plane  (side) 
is  as  in  Fig.  3.  (D)  Surface  representation  as  in  (B),  with  direction  of  view  as  in  (C).  The  molecular 
envelope  of  pol  II  determined  by  electron  microscopy  of  2D  crystals  at  16  A  resolution  is  indicated 
(yellow  line),  as  is  the  location  of  subunits  Rpb4  and  RpbZ  (arrow,  Rpb4/7),  determined  by  difference 
2D  crystallography  (25). 
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binding  site  for  three  reasons.  First,  the  length 
and  location  of  the  groove  are  appropriate  for 
binding  a  region  of  RNA  10  to  20  nucleotides 
from  the  active  site,  in  agreement  with  bio¬ 
chemical  studies.  Second,  the  RNA  path  would 
lead  back  toward  the  downstream  DNA,  ending 
in  close  proximity  to  the  NHj-terminal  region 
of  Rpbl  (defined  by  a  zinc  site).  This  path 
would  accord  with  the  reported  cross-linking  of 
RNA  about  20  nucleotides  upstream  of  the 
active  site  to  the  NH2-terminal  region  of  the 
largest  subunit  of  E.  coli  RNA  polymerase  (58- 
60).  Finally,  RNA  in  the  groove  at  the  base  of 
the  clamp  could  explain  the  great  stability  of 
transcribing  complexes.  The  affinity  of  the 
polymerase  for  the  DNA  template  is  coupled  to 
the  presence  of  an  RNA  transcript  (60).  We 
speculate  that  closure  of  the  clamp  over  DNA, 
assuring  its  retention  in  a  transcribing  complex, 
would  enlarge  the  groove  at  the  base  of  the 
clamp,  and  subsequent  binding  of  RNA  in  the 
groove  would  prevent  the  clamp  from  reopen¬ 
ing.  RNA  would  act  as  a  lock  on  the  closed 
conformation  of  the  clamp. 

Mobility  of  the  clamp  may  also  be  modulat¬ 
ed  by  interactions  with  other  pol  II  subunits  and 
transcription  factors,  for  example,  Rpb4  and 
Rpb7.  Although  these  two  small  subunits  were 
absent  from  the  form  of  pol  II  analyzed  here, 
their  approximate  location  is  known  from  elec¬ 
tron  microscopy  of  2D  crystals  (25).  A  surface 
representation  of  the  crystallographic  backbone 
model  corresponds  closely  with  the  molecular 
envelope  from  2D  ciystals  (Fig.  6D).  On  this 
basis,  Rpb4  and  Rpb7  occupy  a  crevice  in  the 
surface  between  the  lower  jaw  and  the  clamp 
(Fig.  6D).  Interaction  with  either  of  these 
mobile  elements  or  with  downstream  DNA 
could  underlie  the  requirement  for  Rpb4  and 
Rpb7  for  the  initiation  of  transcription  (22). 


A  funnel  for  substrate  entry,  back¬ 
tracking,  and  elongation  factor  access.  The 
floor  of  the  Rpbl-Rpb2  cleft,  which  supports 
duplex  DNA  and  the  DNA-RNA  hybrid,  is 
very  thin  and  perforated,  exposing  the  nucleic 
acids  to  the  space  below.  The  perforation  is 
bisected  by  the  helix  that  forms  a  bridge 
between  Rpbl  and  Rpb2,  creating  two  pores, 
one  of  which  lies  beneath  the  active  site  (pore 
1)  and  the  other,  beneath  the  downstream 
DNA  (pore  2).  Both  pores  are  about  12  A  in 
diameter  and  lie  at  the  apex  of  an  inverted 
funnel-shaped  cavity,  which  increases  to 
about  30  A  in  diameter  at  the  opposite  side  of 
pol  II  (Fig.  7,  bottom).  As  the  Rpbl-Rpb2 
cleft  is  occupied  by  duplex  DNA  and  the 
DNA-RNA  hybrid  during  transcription,  nu¬ 
cleotides  may  be  unable  to  enter  above  the 
active  site  and  may  instead  gain  access  from 
below,  through  the  funnel  and  pore  1,  as 
previously  suggested  for  both  pol  II  and  bac¬ 
terial  RNA  polymerase  (29,  43). 

The  funnel  and  pore  1  may  play  similar 
roles  in  other  aspects  of  transcription.  Bacterial 
and  eukaryotic  RNA  polymerases  oscillate  be¬ 
tween  forward  (polymerization)  and  baekward 
(backtracking)  movement  during  transcription 
(Fig.  6A)  (60).  Backtracking  is  important  for 
proofreading  and  for  traversing  obstacles  such 
as  DNA  damage,  bound  proteins,  or  natural 
pause  sites  in  the  DNA.  During  backtracking, 
the  polymerase  and  associated  transcription 
bubble  move  backward  along  both  the  DNA 
and  the  RNA.  The  region  engaged  in  the  DNA- 
RNA  hybrid  retreats  like  a  zipper,  releasing  the 
3'  end  of  the  RNA  in  single-stranded  form,  and 
incorporating  single-stranded  RNA  on  the  5' 
side  of  the  transcription  bubble  into  the  hybrid 
(Fig.  6A).  As  mentioned  above  for  access  of 
nucleotides  to  the  active  site  during  polymer¬ 


ization,  duplex  DNA  and  hybrid  in  the  Rpbl- 
Rpb2  cleft  may  block  release  of  the  3'  end  of 
the  RNA  into  the  cleft  during  backtracking. 
Rather,  as  suggested  for  entry  of  nucleotides, 
the  3'  end  of  the  RNA  may  exit  through  the 
fiinnel  and  pore  1. 

Backtracking  beyond  a  certain  point  can 
result  in  an  arrested  complex,  unable  to  re¬ 
verse  direction,  to  restore  the  3'  end  of  the 
RNA  to  the  active  site,  and  to  resume  tran¬ 
scription  (60).  We  speculate  that  when  a  cer¬ 
tain  length  of  RNA  has  been  extruded  by 
backtracking,  it  may  interact  with  a  site  in  the 
fiiimel  and  be  trapped,  preventing  reversal 
and  recovery.  For  recovery  from  arrest, 
cleavage  of  the  RNA  is  required  to  generate  a 
new  3'  end  at  the  active  site  (60).  This  cleav¬ 
age  is  achieved  with  the  help  of  transcript 
cleavage  factors  (62,  63).  The  funnel  and 
pore  1  may  provide  access  for  such  factors, 
for  example,  TFIIS.  A  small  zinc-binding  do¬ 
main  of  TFIIS  has  an  extended  p  hairpin  at 
one  end  with  two  conserved  residues  that 
come  near  the  active  site  of  pol  II  and  that  are 
critical  for  RNA  cleavage  (75,  16,  64-  66). 
Also  included  are  tryptophan  and  arginine 
side  chains  involved  in  nucleic  acid  binding 
(67,  68).  Modeling  shows  that  this  domain, 
only  20  A  in  diameter,  can  be  accommodated 
in  pore  1  with  the  two  conserved  P  hairpin 
residues  reaching  the  active  site,  while  still 
leaving  room  for  an  extruded  strand  of  RNA. 

Comparison  with  bacterial  RNA  poly¬ 
merase.  Most  information  about  core  bacte¬ 
rial  RNA  polymerase  structure  comes  from 
x-ray  diffraction  studies  of  the  a2  homodimer 
from  E.  coli  (69)  and  the  polymerase 

from  Thermus  aquaticus  (43).  Regions  of 
sequence  similarity  have  been  noted  between 
a,  Rpb3,  and  Rpbll  (69),  between  P  and 


side 

Fig.  7.  Possible  RNA  exit  grooves  and  funnel  beneath  the  active  site. 
The  model  of  Fig.  6B  is  shown  in  two  perpendicular  directions  of  view 
(side,  back),  and  also  viewed  from  the  opposite  side  (bottom).  To  the 
side  and  back  views  have  been  added  dashed  lines  corresponding  to 
about  10  nucleotides  of  RNA,  lying  in  well-defined  grooves  leading 


bottom 

away  from  the  hybrid-binding  region  (groove  1,  red;  groove  2,  or¬ 
ange).  The  nontemplate  strand  of  the  DNA  within  the  transcription 
bubble  and  the  upstream  DNA  duplex  are  not  shown.  To  the  bottom 
view  has  been  added  a  solid  line  indicating  the  rim  of  the  funnel- 
shaped  cavity. 
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Rpb2  (70),  and  between  P'  and  Rpbl  (77). 
The  crystallographic  pol  II  model  contains  a 
conserved  core  of  secondary  structural  ele¬ 
ments  similar  to  those  in  the  bacterial  en¬ 
zyme,  surrounded  by  divergent  elements  and 
eukaryote-specific  subunits.  Conserved  ele¬ 
ments  are  located  in  the  vicinity  of  the  DNA- 
RNA  hybrid  binding  site,  the  adjacent  down¬ 
stream  DNA  binding  site,  and  the  sides  of  the 
funnel.  Consistent  with  the  conservation  of 
these  structural  elements,  similar  modes  of 
interaction  with  nucleic  acids  in  the  vicinity 
of  the  active  site  have  been  proposed  for 
the  eukaryotic  and  bacterial  enzymes  (72). 
The  pore  beneath  the  active  site  is  con¬ 
served,  and  the  bacterial  enzyme  may  con¬ 
tain  a  clamp  as  well  (7i).  On  the  other 
hand,  the  jaws,  which  include  eukaryote- 
specific  subunits  and  a  domain  of  Rpbl ,  are 
found  only  in  pol  II,  possibly  reflecting 
their  interaction  with  the  eukaryote-specific 
transcription  initiation  factor  TFIIE,  as  re¬ 
vealed  by  2D  crystallography  (26).  The 
occurrence  of  jaws  in  pol  II,  but  not  in  the 
bacterial  enzyme,  presumably  accounts  for 
the  nuclease  protection  of  about  20  base 
pairs  of  downstream  DNA  by  pol  II,  com- 


/'•  • 


pared  with  only  about  1 3  base  pairs  by  the 
bacterial  enzyme  (45,  60). 

A  more  detailed  comparison  is  possible,  at 
present,  for  the  aj  dimer  and  its  counterpart  in 
pol  II,  the  Rpb3-RpbII  heterodimer.  The 
dimer  nucleates  assembly  of  bacterial  polymer¬ 
ase,  binding  p  to  form  a  subcomplex,  which 
then  binds  p'  to  form  a  complete  core  enzyme 
(74).  Similarly,  the  Rpb3-Rpbll  heterodimer 
binds  Rpb2  to  form  a  subcomplex  (75).  The 
location  of  the  heterodimer  in  pol  II  is  similar  to 
that  of  ttj  in  the  bacterial  enzyme,  and  the 
domain  conserved  between  Rpb3,  Rpbl  1,  and 
a  exhibits  an  identical  fold  (motif  of  a  helices 
and  p  sheets  forming  the  lower  half  of  the 
subcomplex  in  Fig.  8).  The  conserved  domain 
represents  almost  the  entirety  of  Rpbl  1  and  is 
responsible  for  Rpb3-Rpbll  interaction  (or 
dimerization  in  the  case  of  a).  The  noncon- 
served  domain  of  Rpb3  (upper  half  of  the  sub¬ 
complex  in  Fig.  8)  interacts  with  the  eukaryote- 
specific  subunits  RpblO  and  Rpbl 2.  Contact  of 
RpblO  with  Rpb3  is  consistent  with  biochemi¬ 
cal  evidence  for  a  stable  Rpb3-Rpb  11 -RpblO 
subcomplex  (76).  Rpbl2  binds  through  a  tail, 
which  adds  a  P  strand  to  a  sheet  in  the  noncon- 
served  region  of  Rpb3.  Rpbl2  also  interacts 


with  Rpb2  through  its  zinc-binding  module. 
Consistent  with  this,  Rpbl 2  has  been  shown  to 
contact  the  second  largest  subunit  in  RNA  poly¬ 
merase  I,  and  this  interaction  requires  an  intact 
zinc-binding  motif  (77).  Moreover,  a  muta¬ 
tion  in  the  COOH-terminal  region  of  Rpbl 2 
impairs  assembly  of  RNA  polymerase  III 
(77).  Thus,  Rpbl2  appears  to  play  an  essen¬ 
tial  role  in  the  assembly  or  maintenance  of  all 
eukaryotic  RNA  polymerases  by  bridging  be¬ 
tween  the  Rpb3-Rpb  11 -RpblO  subcomplex 
(or  its  homologs  in  polymerases  I  and  III) 
and  the  second  largest  subunit. 

Transcription  pathway.  The  ciystallo- 
graphic  model  of  pol  II  also  gives  insight  into 
the  transcription  pathway  and  the  still  larger 
multiprotein  complexes  involved.  The  pathway 
begins  with  the  formation  of  a  TFIIB-TFIID- 
promoter  DNA  complex  and  its  interaction  with 
pol  II,  followed  by  entry  of  TFIIE,  and  finally 
TFIIH,  whose  helicase  activities  melt  DNA 
around  the  start  site  of  transcription.  The  initial 
interaction  of  pol  II  with  the  promoter  must  be 
with  essentially  straight,  duplex  DNA.  The  pol 
II  model,  however,  requires  a  considerable  dis¬ 
tortion  for  binding  at  the  active  site,  which  can 
only  occur  upon  melting.  The  transition  from  an 
initial  complex  to  a  transcribing  complex  will 
therefore  be  accompanied  by  stmctural  changes 
and  movement  of  the  DNA.  Transcription 
begins  with  the  repeated  synthesis  and  release 
of  short  RNAs  (“abortive  cycling”),  until  a 
barrier  at  about  10  nucleotides  is  traversed, 
and  chain  elongation  ensues.  On  reaching  a 
transcript  size  of  about  20  nucleotides,  the 
full  stability  of  a  transcribing  complex  is 
attained.  The  barrier  at  10  nucleotides  corre¬ 
sponds  to  the  point  at  which  the  5'  end  of  the 
growing  transcript  must  disengage  from  the 
template  DNA  and  enter  the  proposed  groove 
for  RNA  in  the  model.  The  transcript  size 
needed  for  full  stability  corresponds  with  the 
length  of  RNA  needed  to  fill  the  groove. 

The  interpretation  along  these  lines  may 
be  extended  and  evaluated  by  the  solution  of 
pol  II  cocrystal  structures,  with  the  use  of  the 
pol  II  model  for  molecular  replacement.  Co¬ 
crystals  with  TFIIB  and  TFIIE  (7^)  should 
reveal  the  trajectory  of  DNA  in  the  initial  pol 
Il-promoter  complex.  Cocrystals  containing 
pol  II  in  the  act  of  transcription  (79)  will 
show  the  locations  of  nucleic  acids  in  an 
elongation  complex.  Cocrystals  with  TFIIS 
(80)  may  indicate  the  proposed  exit  pathway 
for  RNA  through  a  pore  beneath  the  active 
site  during  backtracking.  Other  cocrystals 
may  be  sought  to  investigate  the  mechanism 
of  transcriptional  regulation  by  the  multipro¬ 
tein  Mediator  complex  and  associated  activa¬ 
tor  and  repressor  proteins  (4). 
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est  protostars;  they  are  surrounded  by  large 
and  dusty  envelopes  that  feed  the  central 
objects  and  their  protoplanetary  disks.  These 
sources  undergo  violent  ejection  of  matter 
related  to  accretion  processes.  The  shock- 
waves  created  when  the  protostellar  ejecta 
collides  with  the  surrounding  gas  produce  the 
Herbig-Haro  (HH)  jets  observed  at  optical 
wavelengths.  These  jets  seem  to  drive  the 
bipolar  molecular  outflows  (4-6)  detected 
around  protostars  and  represent  a  second 
mass  loss- driven  phenomenon  taking  place 
during  the  earliest  evolutionary  stages  of  the 
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Windows  Through  the  Dusty 
Disks  Surrounding  the  Youngest 
Low-Mass  Protostellar  Objects 

J.  Cernicharo,^*  A.  Noriega-Crespo,^  D.  Cesarsky,^  B.  Lefloch,"’’'* 
E.  Gonzalez-Alfonso,''  F.  Najarro,''  E.  Dartois,^  S.  Cabrit^ 

The  formation  and  evolution  of  young  low-mass  stars  are  characterized  by 
important  processes  of  mass  loss  and  accretion  occurring  in  the  innermost 
regions  of  their  placentary  circumstellar  disks.  Because  of  the  large  obscuration 
of  these  disks  at  optical  and  infrared  wavelengths  in  the  early  protostellar  stages 
(class  0  sources),  they  were  previously  detected  only  at  radio  wavelengths  using 
interferometric  techniques.  We  have  detected  with  the  Infrared  Space  Obser¬ 
vatory  the  mid-infrared  (mid-IR)  emission  associated  with  the  class  0  protostar 
VLA1  in  the  HH1-HH2  region  located  in  the  Orion  nebula.  The  emission  arises 
in  three  wavelength  windows  (at  5.3,  6.6,  and  7.5  micrometers)  where  the 
absorption  due  to  ices  and  silicates  has  a  local  minimum  that  exposes  the 
central  part  of  the  young  protostellar  system  to  mid-IR  investigations.  The 
mid-IR  emission  arises  from  a  central  source  with  a  diameter  of  4  astronomical 
units  at  an  averaged  temperature  of  —700  K,  deeply  embedded  in  a  dense  region 
with  a  visual  extinction  of  80  to  100  magnitudes. 
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